Remove rows in an array containing a non-matching element

2 visualizaciones (últimos 30 días)
astein
astein el 11 de Abr. de 2017
Comentada: astein el 16 de Abr. de 2017
I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?

Respuesta aceptada

Guillaume
Guillaume el 11 de Abr. de 2017
Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);
  3 comentarios
Guillaume
Guillaume el 12 de Abr. de 2017
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
readtable is extremely flexible. Look at its documentation to see all the options available.
astein
astein el 16 de Abr. de 2017
Thank you very much for the help!

Iniciar sesión para comentar.

Más respuestas (1)

Image Analyst
Image Analyst el 11 de Abr. de 2017
Look into ismember() or setdiff()
  1 comentario
astein
astein el 11 de Abr. de 2017
Editada: astein el 11 de Abr. de 2017
I don't know how to use either for this purpose. setdiff() is going to give me the genes they don't have in common? I want the genes they have in common. ismember() gives me a logical array. I run into the same issue of how do I use the array to pull out only the rows that are "true". I am having difficulty manipulating the datasets (which format to load the txt files--structure, table, etc).

Iniciar sesión para comentar.

Categorías

Más información sobre Tables en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by