GraphFrames api是否支持創建Bipartite圖形？

GraphFrames api是否支持在當前版本中創建Bipartite圖形？GraphFrames api是否支持創建Bipartite圖形？

當前版本：0.1.0

星火版本：1.6.1

2016-04-13 Praneeth Reddy G

不，它不會出現，除非您遵循提供的解決方案[這裏]（http://stackoverflow.com/a/33243012/3415409） – eliasah

正如在評論這個問題指出，無論是GraphFrames也不GraphX已經內置了對二分圖的支持。但是，它們都具有足夠的靈活性讓您創建二分圖。對於GraphX解決方案，請參閱this previous answer。該解決方案使用不同頂點/對象類型之間的共享特徵。雖然這與RDDs一起工作，但這不適用於DataFrames。 DataFrame中的一行具有固定模式 - 它有時不包含price列，有時不包含。它可以有一個price列，有時候是null，但列必須存在於每一行中。

相反，GraphFrames的解決方案似乎是，你需要定義一個DataFrame這本質上是一個線性子型兩種類型的二分圖的對象 - 它必須包含所有這兩種類型的字段對象。這實際上很容易 - 與full_outer會給你。事情是這樣的：

val players = Seq(
    (1,"dave", 34), 
    (2,"griffin", 44) 
).toDF("id", "name", "age") 

val teams = Seq(
    (101,"lions","7-1"), 
    (102,"tigers","5-3"), 
    (103,"bears","0-9") 
).toDF("id","team","record")

然後，您可以創建一個超集DataFrame這樣的：

val teamPlayer = players.withColumnRenamed("id", "l_id").join(
    teams.withColumnRenamed("id", "r_id"), 
    $"r_id" === $"l_id", "full_outer" 
).withColumn("l_id", coalesce($"l_id", $"r_id")) 
.drop($"r_id") 
.withColumnRenamed("l_id", "id") 

teamPlayer.show 

+---+-------+----+------+------+ 
| id| name| age| team|record| 
+---+-------+----+------+------+ 
|101| null|null| lions| 7-1| 
|102| null|null|tigers| 5-3| 
|103| null|null| bears| 0-9| 
| 1| dave| 34| null| null| 
| 2|griffin| 44| null| null| 
+---+-------+----+------+------+

你可能做到這一點吸塵器structs：

val tpStructs = players.select($"id" as "l_id", struct($"name", $"age") as "player").join(
    teams.select($"id" as "r_id", struct($"team",$"record") as "team"), 
    $"l_id" === $"r_id", 
    "full_outer" 
).withColumn("l_id", coalesce($"l_id", $"r_id")) 
.drop($"r_id") 
.withColumnRenamed("l_id", "id") 

tpStructs.show 

+---+------------+------------+ 
| id|  player|  team| 
+---+------------+------------+ 
|101|  null| [lions,7-1]| 
|102|  null|[tigers,5-3]| 
|103|  null| [bears,0-9]| 
| 1| [dave,34]|  null| 
| 2|[griffin,44]|  null| 
+---+------------+------------+

我我也會指出，或多或少的相同的解決方案將在GraphX與RDDs工作。你總是可以通過創建連接兩個case classes頂點不共享任何traits：

case class Player(name: String, age: Int) 
val playerRdd = sc.parallelize(Seq(
    (1L, Player("date", 34)), 
    (2L, Player("griffin", 44)) 
)) 

case class Team(team: String, record: String) 
val teamRdd = sc.parallelize(Seq(
    (101L, Team("lions", "7-1")), 
    (102L, Team("tigers", "5-3")), 
    (103L, Team("bears", "0-9")) 
)) 

playerRdd.fullOuterJoin(teamRdd).collect foreach println 
(101,(None,Some(Team(lions,7-1)))) 
(1,(Some(Player(date,34)),None)) 
(102,(None,Some(Team(tigers,5-3)))) 
(2,(Some(Player(griffin,44)),None)) 
(103,(None,Some(Team(bears,0-9))))

與所有對於前面的答案，這似乎是一個更靈活的方式來處理它 - 而無需共享trait組合的對象之間。

來源

2016-04-21 04:42:00

GraphFrames api是否支持創建Bipartite圖形？

回答

相關問題