2017-02-10 61 views
-1

我正在努力獲得2數據幀的CROSS JOIN。我正在使用spark 2.0。如何使用2個數據框來實現CROSSS JOIN?如何交叉連接2數據幀?

編輯:

val df=df.join(df_t1, df("Col1")===df_t1("col")).join(df2,joinType=="cross join").where(df("col2")===df2("col2")) 
+0

向我們展示您嘗試過的。 ... –

+0

val df = df.join(df_t1,df(「Col1」)=== df_t1(「col」))。join(df2,joinType ==「cross join」)其中(df(「col2」)) === DF2( 「COL2」)) – Miruthan

回答

0

呼叫加入與其他數據幀,而無需使用連接條件。

看看下面的示例。 鑑於以人爲本數據框:區域

+---+------+-------+------+ 
| id| name| mail|idArea| 
+---+------+-------+------+ 
| 1| Jack|[email protected]|  1| 
| 2|Valery|[email protected]|  1| 
| 3| Karl|[email protected]|  2| 
| 4| Nick|[email protected]|  2| 
| 5| Luke|[email protected]|  3| 
| 6| Marek|[email protected]|  3| 
+---+------+-------+------+ 

和第二數據幀:

+------+--------------+ 
|idArea|  areaName| 
+------+--------------+ 
|  1|Amministration| 
|  2|  Public| 
|  3|   Store| 
+------+--------------+ 

的CROSS JOIN是簡單地由下式給出:

val cross = people.join(area) 
+---+------+-------+------+------+--------------+ 
| id| name| mail|idArea|idArea|  areaName| 
+---+------+-------+------+------+--------------+ 
| 1| Jack|[email protected]|  1|  1|Amministration| 
| 1| Jack|[email protected]|  1|  3|   Store| 
| 1| Jack|[email protected]|  1|  2|  Public| 
| 2|Valery|[email protected]|  1|  1|Amministration| 
| 2|Valery|[email protected]|  1|  3|   Store| 
| 2|Valery|[email protected]|  1|  2|  Public| 
| 3| Karl|[email protected]|  2|  1|Amministration| 
| 3| Karl|[email protected]|  2|  2|  Public| 
| 3| Karl|[email protected]|  2|  3|   Store| 
| 4| Nick|[email protected]|  2|  3|   Store| 
| 4| Nick|[email protected]|  2|  2|  Public| 
| 4| Nick|[email protected]|  2|  1|Amministration| 
| 5| Luke|[email protected]|  3|  2|  Public| 
| 5| Luke|[email protected]|  3|  3|   Store| 
| 5| Luke|[email protected]|  3|  1|Amministration| 
| 6| Marek|[email protected]|  3|  1|Amministration| 
| 6| Marek|[email protected]|  3|  2|  Public| 
| 6| Marek|[email protected]|  3|  3|   Store| 
+---+------+-------+------+------+--------------+ 
2

升級到最新的火花sql_2的版本.11版本2.1.0並使用函數.crossJoin數據集