1
文件我有以下GraphML文件「mygraph.gml」,我想用一個簡單的python腳本解析:如何循環GraphML與LXML
這代表了一個簡單圖2個節點「NODE0」,「節點1" 並把它們
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="weight" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n0">
<data key="name">node1</data>
</node>
<node id="n1">
<data key="name">node2</data>
</node>
<edge source="n1" target="n0">
<data key="weight">1</data>
</edge>
</graph>
</graphml>
之間的邊緣這代表具有兩個節點N0和N1與權重1之間的邊緣的曲線圖。 我想用python解析這個結構。
我寫了一個腳本LXML的幫助(我需要使用它,因爲比這個簡單的例子非常非常大的數據集,超過10^5個節點,蟒蛇minidom命名太慢)
import lxml.etree as et
tree = et.parse('mygraph.gml')
root = tree.getroot()
graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[@key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[@key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[@key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[@key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[@key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[@key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[@key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"
}
graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))
這個腳本得到正確的節點和邊緣,使我可以在他們簡單地重複
for n in nodes:
print n.attrib
上邊緣
或類似:
for e in edges:
print (e.attrib['source'], e.attrib['target'])
但我無法真正理解如何獲得邊或節點的「數據」標籤以打印邊緣權重和節點標籤「名稱」。
這並沒有爲我工作:
weights = graph.findall(graphml.get("weight"))
最後的名單始終是空的。爲什麼?我錯過了一些東西,但不明白是什麼。
謝謝!這絕對是我尋找的解決方案! 再次感謝,現在我瞭解了樹的結構以及如何迭代它。 – linello 2012-04-18 10:09:13