表示線與numpy的陣列

，我在下面的格式接收數據：表示線與numpy的陣列

tail head 
P01106 Q09472 
P01106 Q13309 
P62136 Q13616 
P11831 P18146 
P13569 P20823 
P20823 P01100 
...

有沒有格式化這個數據與numpy的陣列的圖形的好方法嗎？我希望使用這個圖來計算PageRank。

到目前爲止，我有

import numpy as np 
data = np.genfromtxt('wnt_edges.txt', skip_header=1, dtype=str)

我想使用從Representing graphs (data structure) in Python圖形數據結構，但它似乎沒有什麼意義在這種情況下，因爲我會做矩陣乘法。

來源

2017-04-25 Simon

你有沒有考慮過使用'networkx'？您可以輕鬆地將您的陣列轉換爲邊緣列表。 – DyZ

爲避免重新發明輪子，應按照註釋和其他答案中的建議使用networkx。

如果出於教育目的，你想想要重新發明輪子，你可以創建一個adjacency matrix。可以從該矩陣計算PageRank：

PageRank值是修改後的鄰接矩陣的主導右特徵向量的條目。

由於鄰接矩陣中的每個行/列代表一個節點，則需要枚舉節點，以便每個節點通過從0

import numpy as np 

data = np.array([['P01106', 'Q09472'], 
       ['P01106', 'Q13309'], 
       ['P62136', 'Q13616'], 
       ['P11831', 'P18146'], 
       ['P13569', 'P20823'], 
       ['P20823', 'P01100']]) 


nodes = np.unique(data) # mapping node name --> index 
noidx = {n: i for i, n in enumerate(nodes)} # mapping node index --> name 

n = nodes.size # number of nodes 

numdata = np.vectorize(noidx.get)(data) # replace node id by node index 

A = np.zeros((n, n)) 
for tail, head in numdata: 
    A[tail, head] = 1 
    #A[head, tail] = 1 # add this line for undirected graph

這導致開始一個唯一的編號表示以下圖形表示A：

array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.], 
     [ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
     [ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.], 
     [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], 
     [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

A 1在5行，例如第0列意味着存在從節點5到節點0-5的邊緣，其對應於'P20823' - >'P01100'。使用nodes陣列從索引中查找節點名稱：

print(nodes) 
['P01100' 'P01106' 'P11831' 'P13569' 'P18146' 'P20823' 'P62136' 'Q09472' 
'Q13309' 'Q13616']

如果有很多節點和連接數，最好使用sparse matrix爲A。但是，首先嚐試使用密集矩陣，只是懶得改用稀疏的存儲器或性能問題。

來源

2017-04-25 07:21:33 kazemakase

我強烈建議networkx：

import networkx as nx 
#make the graph 
G = nx.Graph([e for e in data]) 
#compute the pagerank 
nx.pagerank(G) 
# output: 
# {'P01100': 0.0770275315329843, 'P01106': 0.14594493693403143, 
# 'P11831': 0.1, 'P13569': 0.0770275315329843, 'P18146': 0.1, 
# 'P20823': 0.1459449369340315, 'P62136': 0.1, 'Q09472': 
# 0.07702753153298428, 'Q13309': 0.07702753153298428, 'Q13616': 0.1}

這一切都需要。 pagerank文檔在這裏。

來源

2017-04-25 05:07:43 Mahdi

表示線與numpy的陣列

回答

相關問題