Removing duplicate edges?

8 visualizaciones (últimos 30 días)
Sawyer Smith
Sawyer Smith el 24 de Oct. de 2016
Respondida: Christine Tobler el 2 de Nov. de 2016
Hey all, very new to matlab and am trying to use it to generate protein-protein interaction networks. The first dataset I used went through just fine, but I'm having some trouble with my second one.
Gene1 'MAP2K4' 'MYPN' 'ACVR1' 'GATA2' 'RPA2'
Gene2 'FLNC' 'ACTN2' 'FNTA' 'PML' 'STAT3'
I have two cell arrays like shown above, where spot 1 in Gene1 interacts with spot 1 in Gene2 (forms an edge). The first time I did this it was easy to just use the graph function >> graph(Gene1,Gene2). But with my second dataset I have double edges(usually flipped around), and when I go to use the graph function I get the error "matlab.internal.graph.MLGraph" and am not sure how to go about removing them.
If you need any more info from me I will be happy to provide it, just starting out and not sure how to say it all quite yet though :)
  2 comentarios
Jiro Doke
Jiro Doke el 25 de Oct. de 2016
Maybe you can provide a small example which reproduces the error.
Sawyer Smith
Sawyer Smith el 25 de Oct. de 2016
gene3 =
'abc'
'bcd'
'def'
gene4 =
'def'
'efg'
'abc'
graph(gene3,gene4)
Should do it (not at a place where I can check it, but pretty sure).

Iniciar sesión para comentar.

Respuestas (3)

Christine Tobler
Christine Tobler el 2 de Nov. de 2016
If your duplicates are always one that is A->B and another B->A, Alexandra's elegant solution will work very well.
If you have duplicate edges that are exactly the same (say A->B and A->B appears twice), you would need some workaround as unique(..., 'rows') is not supported for cell arrays. The simplest is probably to cast your cell array to a categorical, apply unique(..., 'rows'), and cast back to cellstr. Then, you can apply Alexandra's solution to this new array.
A = [Gene1(:), Gene2(:)];
uniqueA = cellstr(unique(categorical(A), 'rows'));
d = digraph(uniqueA(:,1), uniqueA(:,2));
m = full(adjacency(d));
g = graph(m|m',d.Nodes);

Alexandra Harkai
Alexandra Harkai el 25 de Oct. de 2016
A graph-driven workaround could be:
d = digraph(Gene1,Gene2);
m = full(adjacency(d));
g = graph(m|m',d.Nodes);
This creates a digraph with edges A->B and B->A being different directed edges, then making sure we get a symmetric adjacency matrix out of it, and smashing on the Node names we got from the original digraph.
This of course depends how fast and large you need this to be. This may not be the most efficient solution.

Alexandra Harkai
Alexandra Harkai el 25 de Oct. de 2016
Seems like this was a similar question, only with numeric graph node names.
If you have 2016b, then you can treat Gene1 and Gene2 as string arrays and sort them the same way.
  1 comentario
Sawyer Smith
Sawyer Smith el 25 de Oct. de 2016
Editada: Sawyer Smith el 25 de Oct. de 2016
Mmk, that is pretty dang similar. But I have 2016a and the sorting function doesn't seem to work on my arrays :( At least when I try to sort it by rows.

Iniciar sesión para comentar.

Categorías

Más información sobre Genomics and Next Generation Sequencing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by