1
我試圖連接兩個dataframes,這看起來像:連接兩個dataframes pyspark
df1:
+---+---+
| a| b|
+---+---+
| a| b|
| 1| 2|
+---+---+
only showing top 2 rows
df2:
+---+---+
| c| d|
+---+---+
| c| d|
| 7| 8|
+---+---+
only showing top 2 rows
他們都有相同的行數,我想這樣做:
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| a| b| c| d|
| 1| 2| 7| 8|
+---+---+---+---+
我想:
df1=df1.withColumn('c', df2.c).collect()
df1=df1.withColumn('d', df2.d).collect()
但沒有成功,給了我這個錯誤:
Traceback (most recent call last):
File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o2804.withColumn.
有沒有辦法呢?
感謝
ROWNUMBER()會加入這樣的方式。 – Suresh
我是新來的pyspark,我不知道該怎麼做 – abdelkarim
你試過[這](https://stackoverflow.com/questions/37332434/concatenate-two-pyspark-dataframes)? – ChatterOne