將幾個graphml文件合併到networkx中並刪除重複項

我是新手編程，Python和networkx（ouch！）並嘗試將四個graphml文件合併爲一個並刪除重複節點，遵循優秀指令here 將幾個graphml文件合併到networkx中並刪除重複項

但是，我不知道如何跟蹤重複節點，當有四個文件比較，而不是兩個。我在下面寫的代碼不起作用，但你可以希望看到我在想我的錯誤，並幫助我。

# script to merge one or more graphml files into one single graphml file 

# First read graphml-files into Python and Networkx (duplicate variables as necessary) 
A = nx.read_graphml("file1.graphml") 
B = nx.read_graphml("file2.graphml") 
C = nx.read_graphml("file3.graphml") 
D = nx.read_graphml("file4.graphml") 

# Create a new graph variable containing all the previous graphs 
H = nx.union(A,B,C,D, rename=('1-','2-','3-','4-')) 

# Check what nodes are in two or more of the original graphml-files 
duplicate_nodes_a_b = [n for n in A if n in B] 
duplicate_nodes_b_c = [n for n in B if n in C] 
duplicate_nodes_c_d = [n for n in C if n in D] 

all_duplicate_nodes = # How should I get this? 

# remove duplicate nodes 
for n in all_duplicate nodes: 
    n1='1-'+str(n) 
    n2='2-'+str(n) 
    n3='3-'+str(n) 
    n4='4-'+str(n) 
    H.add_edges_from([(n1,nbr)for nbr in H[n2]]) # How can I take care of duplicates_nodes_b_c, duplicates_nodes_c_d? 
    H.remove_node(n2) 

# write the merged graphml files-variable into a new merged graphml file 
nx.write.graphml(H, "merged_file.graphml", encoding="utf-8", prettyprint=True)

來源

2013-06-18 mattiasostmar

如果您有合併2的工作方式，則可以將它成對應用，然後應用到合併圖。至於合併重複列表：'duplicates_a_b = set（n代表n中的n，如果n代表B中的）'等等，那麼'all_duplicates = duplicates_a_b | duplicates_b_c | ...'。 – drevicko

首先，請注意，您使用nx.union的方式不是您想要的。你真的需要用兩張圖來稱呼它。但是如何處理重複的內容會變得很複雜，因爲你必須考慮所有可能的圖形對來查看節點是如何被複制的。

取而代之，我們應該更直接，只需要計算每個節點出現的圖數。這是很容易使用Counter：

import collections 
ctr = collections.Counter() 
for G in [A, B, C, D]: 
    ctr.update(G)

現在確定這只是節點出現一次，使用計數器：

singles = {x for (x,n) in ctr.viewitems() if n == 1}

隨着這組節點，我們就可以計算僅包含那些節點的子圖不重複：

E = nx.union(A.subgraph(singles), B.subgraph(singles)) 
F = nx.union(C.subgraph(singles), D.subgraph(singles)) 
H = nx.union(E, F)

圖表H擁有所有四個初始圖表合併除去重複項。

我所展示的方法制作了幾個中間圖，所以對於大型輸入圖可能會遇到內存問題。如果是這樣，可以在確定重複節點集合，從原始圖形中刪除這些節點，然後在不保留所有中間體的情況下找到並集來完成類似的方法。它看起來像：

import collections 
import networkx as nx 

ctr = collections.Counter() 
for G in [A, B, C, D]: 
    ctr.update(G) 

duplicates = {x for (x,n) in ctr.viewitems() if n > 1} 
H = nx.Graph() 
for G in [A, B, C, D]: 
    G.remove_nodes_from(duplicates) # change graphs in-place 
    H = nx.union(H, G)

這兩種方法都利用了NetworkX功能通常允許給予額外節點並被靜默忽略的方式。

來源

2013-06-19 14:02:42

有不同的方法有趣的東西。兩者都以相同數量的節點結束，但合併的graphml文件中留下的邊數有差異。 – mattiasostmar

我也嘗試了四種不同的graphml文件，這些文件由Twitter上完全不同的搜索推文組成。合併文件以比任何文件都少的節點結束。一定有什麼問題。 – mattiasostmar

你在graphml文件中有重複的節點嗎？（'nx'對這種情況做了什麼？也許'nx'正在合併節點？你可以做一些測試......）。什麼類型的對象是你的節點？你所有的測試都一樣嗎？（這可能會改變衡量平等的方式，因而是重複的） – drevicko

如果graphml文件很簡單（沒有權重，屬性等），那麼在文本級別工作可能會更容易。例如，

cat A.graphml B.graphml C.graphml | sort -r | uniq > D.graphml

這將保留來自三個graphml文件的獨特節點和邊集。您可以稍後使用文本編輯器重新排列D.graphgraph中的<graph>，</graph>，<graphml ...>，</graphml>標籤。

來源

2015-04-20 13:14:52 yya

將幾個graphml文件合併到networkx中並刪除重複項

回答

相關問題