在Spark中合併相交多對多關係

2016-07-29 35 views 0 likes

給定RDD[(A, B)]，其中A和B之間存在多對多關係，如何將關係的交集分組？在Spark中合併相交多對多關係

即，如果可以通過一個或多個B s從一個A到另一個A繪製關係，則應該對它們進行分組。同樣，B s可以通過A s進行分組。

例如，集合：

(1, 'a') 
(2, 'a') 
(2, 'b') 
(1, 'c') 
(3, 'f') 
(4, 'f') 
(5, 'g')

應組成

([1,2], ['a','b','c']) 
([3,4], ['f']) 
([5], ['g'])

我可以使用groupByKey獲得

(1, ['a', 'c']) 
(2, ['a', 'b']) 
(3, ['f']) 
(4, ['f']) 
(5, ['g'])

並且還

('a', [1, 2]) 
('b', [2]) 
('c', [1]) 
('f', [3,4]) 
('g', [5])

但我不知道在哪裏把它從這裏開始。

來源

2016-07-29 Synesso

RDD不支持這樣的行動在箱子外面！我認爲，第一步是正確的。在任何groupBy之後，您需要根據需要對列表進行摺疊。 – rakesh

回答

object ManyToMany extends App { 
    val m = List((1, 'a'), 
    (2, 'a'), 
    (2, 'b'), 
    (1, 'c'), 
    (3, 'f'), 
    (4, 'f'), 
    (5, 'g')) 

    val mInt: Map[Int, Set[Char]] = m.groupBy(_._1).map { case (a, b) => a -> b.map { case (c, d) => d }.toSet } 
    val mChar: Map[Char, Set[Int]] = m.groupBy(_._2).map { case (a, b) => a -> b.map { case (c, d) => c }.toSet } 
    def isIntersect[A](as: List[Set[A]], bs: Set[A]): List[Set[A]] = as.filter { x => x.exists { y => bs.contains(y) } } 
    val c = m.map { case (a, b) => mInt(a) }.foldLeft(List.empty[Set[Char]]) { 
    case (sum, item) => 
     isIntersect(sum, item) match { 
     case Nil => item :: sum 
     case List(x) => 
      sum.filterNot(_ == x) ++ List(x ++ item) 
     } 
    } 
    val d = c.map(x => (x, x.map(mChar(_)).foldLeft(Set.empty[Int]) {  case (sum, i) => sum ++ i })) 
    println(d) 
} 
result: 
List((Set(g),Set(5)), (Set(a, c, b),Set(1, 2)), (Set(f),Set(3, 4)))

來源

2016-08-01 02:41:05 chenhry

相關問題

11. NHibernate集合：多對多關係
12. 在Spark DataFrame中合併多個列[Java]
13. 存在多對多關係
14. Laravel - 在多對多關係
15. UnsupportedOperationException合併保存與休眠和JPA的多對多關係
16. 多對多關係
17. 多對多關係
18. 多對多關係
19. 多對多關係
20. 多對多關係
21. 多對多關係
22. 多對多關係
23. 多對多關係
24. Django：在多對多關係中交換元素
25. sql在多對多關係中得到交集
26. Django - 在一對多關係中更改相關對象的值
27. 刪除多對多關係中的相關行
28. Count（）對多對多關係
29. 多對多多態關係
30. SqlAlchemy關係多對多與其他多對多關係