2013-02-27 93 views
0

我想知道是否有pythonic的方式來做到這一點。字典的刪除蟒蛇的邊數

想我名單:

{'source': 338, 'target': 343, 'value': 0.667693} 
{'source': 339, 'target': 342, 'value': 0.628195} 
{'source': 340, 'target': 346, 'value': 0.529861} 
{'source': 340, 'target': 342, 'value': 0.470139} 
{'source': 341, 'target': 342, 'value': 0.762871} 
{'source': 342, 'target': 349, 'value': 0.664869} 
{'source': 343, 'target': 347, 'value': 0.513025} 
{'source': 343, 'target': 344, 'value': 0.486975} 
{'source': 344, 'target': 347, 'value': 0.536706} 
{'source': 344, 'target': 349, 'value': 0.463294} 
{'source': 345, 'target': 349, 'value': 0.546326} 
{'source': 345, 'target': 346, 'value': 0.453674} 

基本上它的無向圖。但非常混亂。我想清理一下。

所以,我要離開這有最邊緣像原來的格式頂部2個節點..

和其餘節點......連有atmost 5的邊緣。

我只是保持與計數的字典... 反向排序吧..

然後保存頂部2和去通列表中再次..和消除邊緣,但檢查前2者..

有沒有更乾淨的方法來做到這一點。

我的車..凌亂的示例代碼:

import json 
from pprint import pprint 
import operator 
json_data=open('topics350_1.json') 

data = json.load(json_data) 
edges = data["links"] 
node_count_dict = {} 
super_nodes = 3 
min_nodes = 5 

for edge in edges: 
    keys = [edge['source'], edge['target']] 
    for key in keys: 
     if key in node_count_dict: 
      node_count_dict[key] +=1 
     else: 
      node_count_dict[key] = 1 

sorted_nodes = sorted(node_count_dict.iteritems(), key=operator.itemgetter(1),reverse = True)   
#print sorted_nodes 
top_nodes = sorted_nodes[super_nodes] 
final_node_count = {} 
for key in sorted_nodes: 
    final_node_count[key[0]] = 0 
print final_node_count 
link_list = [] 
for edge in edges: 
    keys = [edge['source'], edge['target']] 
    for key in keys: 
     if key not in top_nodes: 
      if final_node_count[key] < min_nodes: 
       link_list.append(edge) 
print link_list 




#print data['links'] 
+1

如果您正在使用圖形工作,你最好的辦法是networkx使用 – Abhijit 2013-02-27 04:37:31

+0

爲什麼不模型'邊= {338 :(343,0.667693),339:(342,0.628195)}而不是'dict'列表? – inspectorG4dget 2013-02-27 04:38:34

+0

你能展示實際的示例代碼嗎? – mgilson 2013-02-27 04:46:36

回答

1

我強烈建議你使用networkx 與圖形工作。

import networkx as nx 
G = nx.Graph() 
# build your Graph 
# G.add_node(), G.add_nodes_from(), G.add_edge(), G.add_edges_from()... 

nodes = [(g, G.degree(g)) for g in G.nodes()] 
# nodes like this: [(338, 4), (340, 7)...] 
# item one is the node, and item two is the edges connected with this node 

nodes.sort(key=lambda n: n[1], reverse=True) 

# you wanna delete the third node and other nodes which edges at most 5, right? 
G.remove_node(nodes[2][1]) 
for n, e in nodes: 
    if e > 5: 
     G.remove_node(n) 

但是,就在你的代碼上面,我將使它象下面這樣:

from collections import Counter 

sources = [] 
for edge in edges: 
    source.append(edge['source']) 
    source.append(edge['target']) 

sources_count = Counter(sources) 
sources_count = sorted(source_count.items(), key=lambda s: s[1], reverse=True) 

sources_count.pop(2) 
valid_nodes = filter(lambda s: s[1] <= 5, sources_count) 

link_list = filter(
    lambda e: e['source'] not in valid_nodes and e['target'] not in valid_nodes, 
    edges 
)