How to segregate data?

1 visualización (últimos 30 días)
Wesser
Wesser el 6 de Jul. de 2021
Respondida: dpb el 6 de Jul. de 2021
I have a dataset of people's (columns) residential history over time (rows, years).
  • numbers 1-3 mean the person lived in one of three sections of TownX,
  • numbers 4-7 mean the person lived in one of four sections of TownY,
  • numbers 8-9 mean the person lived in one of two sections of TownZ,
  • number 10 means the person lived in TownA
  • number 11 means the person wasn't born yet,
  • number 0 means the person lived outside of the three-town community.
I am trying to ID the people that:
  • only lived in TownX over time, so combinations of 11,1,2,3 = Xres
  • only lived in TownY over time, so combinations of 11,4,5,6,7 = Yres
  • only lived in TownZ over time, 11,8,9 = Zres
  • only lived in TownA over time, 11, and 10 = Ares
  • lived in any combination of the towns and/or lived outside the community, so mix of numbers 0-11 = Wander.
I have attached the data.
I think I can pull this together with a massive ifelse statement, but is there a more elegant way of going about this?
Many thanks for your time and thoughts!!
  3 comentarios
Wesser
Wesser el 6 de Jul. de 2021
Data in the cells E5-HC102 are what's pertinent. Columns E-HC represents the people in the study and rows 5-102 are increments of time (years).
As an end result, I'd like each column (person) to be IDed as Xres, Yres, Zres, Ares or Wander.
Wesser
Wesser el 6 de Jul. de 2021
This is my stab at the above problem (beyond the ifelse boondogle)
%the 1 in the below lines tells matlab to do the ANY seach by column
Xres = any(Zone<4,1) & any(Zone>0,1) | any(Zone==11,1); %lived in zones 1,2,3 or wasn't born yet (11)
Yres = any(Zone>3,1) & any(Zone<8,1) | any(Zone==11,1); %lived in zones 4,5,6,7 or wasn't born yet (11)
Zres = any(Zone>7,1) & any(Zone<10,1) | any(Zone==11,1); %lived in zones 8,9 or wasn't born yet (11)
Ares = any(Zone=10,1) | any(Zone==11,1); %lived in zones 10 or wasn't born yet (11)
Wander = ....no clue how to ID all the people who did not fall into one of the above 4 catagories.

Iniciar sesión para comentar.

Respuestas (1)

dpb
dpb el 6 de Jul. de 2021
Something similar to that looks to me to be about the best can be done -- I'd probably write some helper functions similar to
isX=@(x)all(iswithin(x,1,3)|x==11);
isY=@(x)all(iswithin(x,4,7)|x==11);
etc., where the utility of the helper function iswithin is apparent in making the conditional more readable at the higher level. It's just the logical expression moved to cleaner syntax by hiding the combinational logic
function flg=iswithin(x,lo,hi)
% returns T for values within range of input
% SYNTAX:
% [log] = iswithin(x,lo,hi)
% returns T for x between lo and hi values, inclusive
flg= (x>=lo) & (x<=hi);
>>
As for the Wanderer, once you have everybody else classified, it's just whoever is left over --
isW=~(isX|ixY|ixZ|isA);
You can, of course, add another category that is easy which would be the permanent Outside resident which would be
isO=@(x)all(x==0);
if that might also be of interest.
Then, of course, a Wanderer would be those who did live in more than one location in the identified townships.
There's somewhat of a discrepancy, though, as you have the definition of
  • number 0 means the person lived outside of the three-town community.
but there are four identified communities altogether. So, could a "0" be the same individual also coded as a "10", or are they known to be exclusive?
There's a bunch of data in the file that are NaN or the last column imports a char() so can't test all the data -- I cut it down to
tZone=tZone(:,5:end-1); % save only the coded columns and not last char()
tZone=tZone(:,(all(isfinite(tZone{:,:})))); % get rid of everything NaN
leaves with
>> whos tZone
Name Size Bytes Class Attributes
tZone 98x207 208396 table
Applying the above for X resulted in
>> idX=varfun(isX,tZone);
>> any(idX{:,:})
ans =
logical
1
>> sum(idX{:,:})
ans =
3
>> find(idX{:,:})
ans =
43 44 105
>>
Hope that helps...
OH! One thing else one could do would be to create a categorical array and add names for identification to the codes or, I used a lookup table --
Code=0:11;
Area=[1,2*ones(1,3),3*ones(1,4),4*ones(1,2),5,6];
Town=["Wanderer","TownX","TownY","TownZ","TownA","Unborn"];
can be used like
>> id=interp1(Code,Area,randi([0 11],10,1));
>> Town(id)
ans =
1×10 string array
"TownX" "TownY" "Unborn" "TownX" "TownY" "TownY" "TownX" "TownX" "TownA" "TownY"
>>
to identify individual elements in legible manner. As noted, a categorical array could do the same if ordinal and define the display names to go with the values.

Categorías

Más información sobre Data Preprocessing en Help Center y File Exchange.

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by