將PyFark中的GraphFrames ShortestPath映射轉換爲DataFrame行

我試圖找到從GraphFrames函數shortestPath中獲取Map輸出的最有效方式，並將每個頂點的距離映射平鋪到新DataFrame中的各個行中。我已經能夠非常笨拙地將distance列拖到字典中，然後從那裏轉換成熊貓數據框，然後轉換回Spark數據框，但我知道必須有更好的方法。將PyFark中的GraphFrames ShortestPath映射轉換爲DataFrame行

from graphframes import * 

v = sqlContext.createDataFrame([ 
    ("a", "Alice", 34), 
    ("b", "Bob", 36), 
    ("c", "Charlie", 30), 
], ["id", "name", "age"]) 

# Create an Edge DataFrame with "src" and "dst" columns 
e = sqlContext.createDataFrame([ 
    ("a", "b", "friend"), 
    ("b", "c", "follow"), 
    ("c", "b", "follow"), 
], ["src", "dst", "relationship"]) 

# Create a GraphFrame 
g = GraphFrame(v, e) 

results = g.shortestPaths(landmarks=["a", "b","c"]) 
results.select("id","distances").show() 

+---+--------------------+ 
| id|   distances| 
+---+--------------------+ 
| a|Map(a -> 0, b -> ...| 
| b| Map(b -> 0, c -> 1)| 
| c| Map(c -> 0, b -> 1)| 
+---+--------------------+

我要的是採取上述輸出和扁平的距離，同時保持IDS弄成這個樣子：

+---+---+---------+  
| id| v | distance| 
+---+---+---------+ 
| a| a | 0  | 
| a| b | 1  | 
| a| c | 2  | 
| b| b | 0  | 
| b| c | 1  | 
| c| c | 0  | 
| c| b | 1  | 
+---+---+---------+

感謝。

來源

2016-06-18 JiveDonut

您可以爆炸：

>>> from pyspark.sql.functions import explode 
>>> results.select("id", explode("distances"))

來源

2016-06-18 22:19:45

將PyFark中的GraphFrames ShortestPath映射轉換爲DataFrame行

回答

相關問題