首先,感謝您閱讀我的問題。加入數據框火花java
我的問題是如下:在Spark與Java,我加載兩個數據幀的兩個CSV文件的數據。
這些數據幀將具有以下信息。
數據幀機場
Id | Name | City
-----------------------
1 | Barajas | Madrid
數據幀airport_city_state
City | state
----------------
Madrid | España
我想,這樣它看起來像這樣加入這兩個dataframes:
數據幀結果
Id | Name | City | state
--------------------------
1 | Barajas | Madrid | España
其中dfairport.city = dfaiport_city_state.city
但我無法用語法澄清所以我可以正確地進行連接。我是如何創建的變量的一些代碼:
// Load the csv, you have to specify that you have header and what delimiter you have
Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
Dataset <Row> dfairport_city_state = Load.Csv (sqlContext, data_airport_city_state);
// Change the name of the columns in the csv dataframe to match the columns in the database
// Once they match the name we can insert them
Dfairport
.withColumnRenamed ("leg_key", "id")
.withColumnRenamed ("leg_name", "name")
.withColumnRenamed ("leg_city", "city")
dfairport_city_state
.withColumnRenamed("city", "ciudad")
.withColumnRenamed("state", "estado");