Matching ranges of values and inserting new rows in columns

Following a previous question, I now need to dive a little deeper:
I have two tables (workspace is attached):
table1 contains the columns "duration", "WP_1", "WP_2":
table 2 contains the columns "WP1", "WP2", "WC":
I need to split WP_1 and WP_2 in table 1 according to WP1 and WP2 in table2 - what does that mean?
Duration and WC need to be matched, but the ranges of WP_1 & WP_2 and WP1 & WP2 sometimes don't fit.
Let me give you an example:
  • row3 in table1 for WP_1 and WP_2 has a range of 5 to 115 with a duration of 552.8
--> this perfectly fits into row1 of table2, where we have a range of 0 to 145 with a WC of 0
--> therefor 100% of the duration in table1 can be connected to WC 0
Problem:
  • row4 in table1 for WP_1 and WP_2 has a range of 115 to 219 with a duration of 584.7
--> does not fit in just one category:
row1 in table2: range for WP1 and WP2 is 0 to 145 with WC 0
row2 in table2: range for WP1 and WP2 is 145 to 169 with WC 1
row3 in table2: range for WP1 and WP2 is 169 to 1693 with WC 1
Solution:
  • split the WP_1 and WP_2 according to WP1 and WP2 and calculate their new duration
  • duration will be calculated with the according parts
  • WP_1 and WP_2 in table1's row4 will be split and new rows will be inserted:
--> original row4: WP_1: 115 WP_2: 219 duration: 584.7 --> 104 parts (219-115); 1 part =~ 5.62
--> new row4: WP_1: 115 WP_2:145 duration: 168.66 (30 parts (145-115) x 5.62)
--> new row5: WP_1:145 WP_2: 169 duration: 134.93 (24 parts (169-145) x 5.62)
--> new row6: WP_1:169 WP_2: 219 duration: 281.1 (50 parts (219-169) x 5.62)
the new table1 should look like that:
The main problem is the matching and splitting part - calculating the new durations should be quite easy.
Unfortunately as of writing this I have no clue how to match and split - any hints are very much appreciated!
Thanks for your help!

 Respuesta aceptada

Once you know that table2 is sorted in ascending order and that there are no overlaps and no shenanigans, you can map the positions of table1 entries in table2 windows.
% this will generate a rather large matrix which maps the entries in table1
% to windows in table2 which start before they start.
% the last true flag in each row will be the starting WP1 for this entry
wp1Mask = table1.WP_1 >= table2.WP1';
the same trick can be done with WP2 only using less than because we want to find the window in table2 which ends after the table1 entry ends
After you finish mapping everything up, you can start correlating.
I normally prefer using logical indices over using find, because in most cases it is not necessary.
The problem is find doesn't work on a specific dimention in a matrix like other functions do, so you can either do it with a loop, or you can use this nifty trick which was proposed by Matt J in another post:
% this will find the first index of a table2 window that starts before each
% table1 entry - not what we want actually, but could be useful for WP2
[~, iWP1] = max(wp1Mask, [], 2);
% to achieve what we seek we can use cumsum combined with the same trick
% this will find the index of the last table2 window which starts before each table1
% entry - what we seek.
[~, iWP1] = max(cumsum(wp1Mask, 2), [], 2);
Once you know these indices, we can find which table1 entries fit only one window and which fit in several windows, and then split those
hope this is useful

7 comentarios

Hey TADA, thanks for your answer.
Currently looking into it, but it's pretty hard to understand for me - when looking at the indices, I'm not quite sure where it tells me which table1 entries fit multiple windows - as much as I understand it only tells me the row it fits into..
Thanks for your help!
When the index of wp2 is equal to the index of wp1 the entry fits into one window. But when the wp2 index is greater than wp1 index, you have to split it into the range of indices
Thank you, that explains it little better. So I'm running:
wp1Mask = table1.WP1 >= table2.WP1';
wp2Mask = table1.WP2 <= table2.WP2';
[~, iWP1] = max(cumsum(wp1Mask, 2), [], 2);
[~, iWP2] = max(cumsum(wp2Mask, 1), [], 2);
which gives me the indices; looking at the first problem which is row4 I get a value of 1 for iWP1 and a value of 3 for iWP2 --> needs to be split.
But how do these values define the range it needs to be split into? Sorry, I am by no means a mathematician or a programmer - so I might be a little slow..
In the case of WP2, you want the first index where the table2 window ends after the table1 entry ends.
% this will give you the LAST index where table2.WP2 is larger or equal to table1.WP_2
[~, iWP2] = max(cumsum(wp2Mask, 1), [], 2);
% this will give you the FIRST index where table2.WP2 is larger or equal to table1.WP_2
% so this is the line of code you need.
[~, iWP2] = max(wp2Mask, [], 2);
remember, these are indices of rows in table2, so you need to split row4 in table1 into 3 rows, each should fit inside a single row in table2, like you described in your original post.
Thank you again for your answer, this is what I came up (which is not working) but maybe you can tell me if I'm even close to going in the right direction:
wp1Mask = table1.WP1 >= table2.WP1';
wp2Mask = table1.WP2 <= table2.WP2';
[~, iWP1] = max(cumsum(wp1Mask, 2), [], 2);
[~, iWP2] = max(wp2Mask, [], 2);
for x = 1:size(iWP1)
if iWP1(x) < iWP2(x)
WP1 = table1.WP1(x);
WP2 = table1.WP2(x);
DURpart = table1.duration / (WP2 - WP1);
n = iWP2(x)-iWP1(x);
for y = 0:n
if y == 0
newWP1 = [table1.WP1(x+y);table1.WP1(x);table1.WP1(~(x+y))];
newWP2 = [table1.WP2(x+y);table2.WP2(iWP1(x));table1.WP2(~(x+y))];
else
newWP1 = [table1.WP1(x+y);table1.WP2(x+y-1);table1.WP1(~(x+y))];
if WP2 <= table2.WP2(iWP2(x)-iWP1(x)+1)
newWP2 = [table1.WP2(x+y);WP2;table1.WP2(~(x+y))];
else
newWP2 = [table1.WP2(x+y);table.WP2(iWP2(x)-iWP1(x)+1);table1.WP2(~(x+y))];
end
table1{:,[1,2,3]} = [0,newWP1, newWP2];
for z = x:x+n
table1.duration(z) = DURpart * (table1.WP2(z) - table1.WP1(z));
end
end
end
end
end
My thought process:
We go through the indices and everytime iWP2 is greater than iWP1 we want to create y = 0:(iWP2(x)-iWP1(x)) new rows. For these rows we need to define the new WP1 and WP2 and duration.
The first step is the location. The location is defined by the amount of new rows we need and the row we encountered the problem. So the location is x+y.
The second step would be the value. That's where I am already struggling.
The new WP1 of the added row would be the new WP2 of the new-old row.
But the WP1 of the new-old row would stay the same. How can I define that in one go, or do I need to add if y = 0. Even if I added if y = 0, that would change the values I'd need to calculate the duration. Therefor I'm assuming that needs to be done as a first step. Define WP1 and WP2 before we change that row and calculate the duration parts.
Next we set if y = 0, we have a special case where WP1 is defined as the WP1 we already had, and WP2 as the border of table2 range we are hitting.
If y > 0, the WP1 of the inserted row is defined as the WP2 of the previous row.
The new WP2 needs a new if clause, to check if it's smaller or equal the next border we could possibly hit. If it is smaller or equal the WP2 of this row is the WP2 of the original row, we replaced.
Finally if it is bigger, the WP2 is the border we hit at table2.
We then add the new row, where the colmn of duration has a value of 0, which will be calculated in the last step.
Is this somehow comprehensive?
I've been at it for days and am still struggling really hard - this is a key step I need to solve to proceed with my work!
Thanks for reading and your help!
Addmitedly, I didn't read nor run your code, but your thaugt process sounds about right.
There is a simpler way though.. you don't need to test each record, because the indices already define the start and end positions in table2 of each table1 record that needs to be split up.
lets take row 4 which you mentioned as an example:
it starts at 115 and ends at 219.
the corresponding "windows" in table2 are rows 1 (0-145), 2 (145-169) and 3 (169-1693).
You already know from the indices that you calculated that it starts at row 1 and ends at row 3:
iWP1(4) % this should equal 1
iWP2(4) % this should equal 3
These are the indices of the corresponding rows in table2, and they should define the start and end positions of each new row if I understand the idea correctly.
% obviously in your code you would not hard-code 4, but use a loop index
% or something
newValues = table2(iWP1(4):iWP2(4), {'WP1', 'WP2', 'WC'});
next you need to change the WP1 of the first row and WP2 of the last row according to the original values you had from table1 (115 and 219 in stead of 0 and 1693).
now, all that is left is to recalculate the duration accordingly and replace the original row with these three rows.
You may want to add these three rows at the end of the table, and remove all the divided rows together at the end, once table editing is done, as changing the original indices of the rows you need to split will make your life much harder, and is bound to result in nightmarish bugs.
if the order of rows in table 1 is important you can always sort the table after you are done, or instead build up a separate output table as you go along.
Lukas Netzer
Lukas Netzer el 20 de Ag. de 2021
Editada: Lukas Netzer el 20 de Ag. de 2021
edit:
I got it:
wp1Mask = table1.WP1 >= table2.WP1';
wp2Mask = table1.WP2 <= table2.WP2';
[~, iWP1] = max(cumsum(wp1Mask, 2), [], 2);
[~, iWP2] = max(wp2Mask, [], 2);
xIndex = iWP2 > iWP1;
d=find(xIndex);
table_new = table1;
for x = 1:size(table1)
if iWP1(x) == iWP2(x)
continue
elseif iWP1(x) < iWP2(x)
newValues = table2(iWP1(x):iWP2(x), {'WP1', 'WP2'});
newValues.duration(1:size(newValues)) = 0;
newValues = movevars(newValues, 'duration', 'Before', 'WP1');
newValues.WP1(1) = table1.WP1(x);
newValues.WP2(end) = table1.WP2(x);
DurPart = table1.duration(x) / (table1.WP2(x) - table1.WP1(x));
newValues.duration = DurPart .* (newValues.WP2 - newValues.WP1);
table_new = [table_new;newValues];
end
end
table1 = table_new;
table1(d,:) = [];
table1 = sortrows(table1,2);
THANK YOU VERY MUCH FOR YOUR INPUT - may all the best come to you! Have a nice weekend!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Productos

Versión

R2021a

Preguntada:

el 17 de Ag. de 2021

Editada:

el 20 de Ag. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by