2015-12-31 46 views
1

我有一個RDF圖(link)與元組(s,p,o),我做了一個屬性圖。我需要做加入/ joinVertices或在圖形中添加字段由Spark Graphx

val propGraph = Graph(vertexArray,edgeArray).cache() 
propGraph.triplets.foreach(println(_)) 

與如下輸出:

和RDF數據爲:

((0,<http://umkc.edu/xPropGraph#franklin>),(1,http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>) 
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>) 
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>) 
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>) 
((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>) 

當我申請connectedComponents()我的RDF屬性圖由下面的代碼(Complete code)中得到我得到ccccID作爲波紋管 -

val cc = propGraph.connectedComponents().cache() 
cc.triplets.foreach(println(_)) 

隨着輸出:

((0,0),(2,0),<http://umkc.edu/xPropGraph#pi>) 
((0,0),(1,0),<http://umkc.edu/xPropGraph#advisor>) 
((1,0),(2,0),<http://umkc.edu/xPropGraph#collab>) 
((2147483648,2147483648),(4294967295,2147483648),<http://umkc.edu/xPropGraph#student>) 
((6442450942,0),(0,0),<http://umkc.edu/xPropGraph#colleague>) 

我需要得到的東西,如:

((vId_src,src_att),(vId_dst,dst_att),property, ccID) 

即 我需要導致這種三重/圖形格式:

((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>,0) 
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>,0) 
((0,<http://umkc.edu/xPropGraph#franklin>),(1,<http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>,0) 
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>,0) 
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>,2147483648) 

所以我的選擇可能來自加入。我試圖做一些事情,如 val triplets = propGraph.joinVertices(cc.vertices),但無法正確執行。 有什麼辦法可以得到這個?

任何幫助表示讚賞!我是Graphx的新手。:)

+0

,如果你提供例如圖這將是有益的。 (參見例如http://stackoverflow.com/q/34528963/1560062)。目前尚不清楚這裏的類型是什麼,Scala打印輸出不是很有用。 – zero323

+0

@ zero323感謝您的建議。我添加了兩個鏈接。任何幫助表示讚賞! – ChikuMiku

回答

0

我一直在尋找((vId_src,src_att),(vId_dst,dst_att),property, ccID)所以我用zip()兩個RDDs。

val cc: Graph[graphx.VertexId,String] = propGraph.connectedComponents().cache() 
    println("###GRAPH WITH CONNECTED COMPONENTS ###") 
    cc.triplets.foreach(println(_)) 
    println("###VERTICES OF CONNECTED COMPONENTS GRAPH ###") 
    cc.vertices.foreach(println(_)) 
    println("###EDGES OF CONNECTED COMPONENTS GRAPH ###") 
    cc.edges.foreach(println(_)) 


/** 
* Alternative way for join operation*/ 
println("###STEP-2 GETTING ONE MERGED RDD OF NEW GRAPH###") 
val newGraph: RDD[String] = propGraph.triplets.map(t =>t.srcId +","+ t.srcAttr+"),"+"("+t.dstId+","+ t.dstAttr+"),"+t.attr) 
val ccID: RDD[String]=cc.triplets.map(t=>t.srcAttr+"") 
val newPropGraph: RDD[(String,String)]= newGraph.zip(ccID) 
newPropGraph.collect.foreach(println(_)) 

這樣做後,我得到了以下的輸出:

(4294967296,<http://umkc.edu/xPropGraph#node1>),(2147483649,<http://umkc.edu/xPropGraph#node2>),<http://umkc.edu/xPropGraph#prop1>,0) 
(2147483649,<http://umkc.edu/xPropGraph#node2>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop5>,0) 
(4294967295,<http://umkc.edu/xPropGraph#node5>),(2147483648,<http://umkc.edu/xPropGraph#node6>),<http://umkc.edu/xPropGraph#prop3>,2147483648) 
(0,<http://umkc.edu/xPropGraph#node3>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop2>,0) 
(2147483649,<http://umkc.edu/xPropGraph#node2>),(0,<http://umkc.edu/xPropGraph#node3>),<http://umkc.edu/xPropGraph#prop4>,0) 
相關問題