使用withColumn（）將列添加到數據集<Row>給出問題...在運算符中缺少項目

我通常可以通過使用withColumn向現有數據集添加新列來獲取新數據集。但我不知道爲什麼這個案件給錯誤。使用withColumn（）將列添加到數據集<Row>給出問題...在運算符中缺少項目

Dataset<Row> inputDSAAcolonly = inputDSAA.select(colNameA); 
Dataset<Row> inputDSBBcolonly = inputDSBB.select(colNameB); 
inputDSBBcolonly.withColumn(colNameA, inputDSAAcolonly.apply(colNameA)).show();

其中inputDSSAAcolonly是

+----+ 
|Exer| 
+----+ 
|Some| 
|None| 
|None| 
|None|

和inputDSSBBColonly是

+-----+ 
|Smoke| 
+-----+ 
|Never| 
|Regul| 
|Occas| 
|Never|

在數據集基本上單一列。

我需要與2列並排的DS。 withColumn已經工作，但該投擲錯誤：

Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) Exer#7 missing from Smoke#47 in operator
!Project [Smoke#47, Exer#7 AS Exer#112];;!Project [Smoke#47, Exer#7 AS Exer#112]

來源

2017-03-01 Binu

你基本上是試圖加入兩個數據集。

withColumn在對數據框的列進行操作的數據框上工作。您正試圖在不同的數據框上執行操作。

如果確實問題是你的代碼一樣簡單，那麼您可以選擇兩種，並做相關的操作，同時兩者都使用。否則，你需要做一個連接

來源

2017-03-01 09:17:45

同意@Assaf。另請參閱[這裏]（http://stackoverflow.com/questions/40508489/spark-add-dataframe-column-to-another-dataframe-merge-two-dataframes）。 – pheeleeppoo

..have一直在使用它沒有拉鍊或添加與兩個數據集的索引列...說我能做到這一點...數據集 dswithColAAandTotal = dswithColAA.withColumn（「合計」，dswithColAA.col（colnames [0 ]））; \t \t \t \t \t dswithColAAandTotal.show（）;工作正常給我ColAA和一個計算列Total .. – Binu

其中dswithColAA是從數據集下面得到的.. DSvaluesonly.createOrReplaceTempView（「tmpTableAA」）; String SQLQueryAA =「select * from tmpTableAA」; 數據集 dswithColAA = sqlctx.sql（SQLQueryAA）.toDF（）; – Binu

Dataset<Row> inputDS=spark.read().option("header", false). 
      option("inferSchema",true).csv("data/irisAA.csv"); 
    inputDS.show(4); 
    String colAname=inputDS.columns()[0]; 
    log.error(colAname); 
    String colBname=inputDS.columns()[1]; 
    log.error(colBname); 
    Dataset<Row> DSColA=inputDS.select(inputDS.col(colAname)); 
    DSColA.show(4); 
    Dataset<Row> DSColB=inputDS.select(inputDS.col(colBname)); 
    DSColB.show(4); 
    Dataset<Row> DSColAandColA=DSColA.withColumn("Addt_Column", inputDS.col(colAname)); 
    DSColAandColA.show(4); 
    /*Dataset<Row> DSColAandColB=DSColA.withColumn("Addt_Column", inputDS.col(colBname)); 
    DSColAandColB.show(4); //THIS FAILS........STILL DON'T GET WHY */ 
    Dataset<Row> DSColAwithIndex=DSColA.withColumn("df1Key", monotonically_increasing_id()); 
    DSColAwithIndex.show(4); 
    Dataset<Row> DSColBwithIndex=DSColB.withColumn("df2Key", monotonically_increasing_id()); 
    DSColBwithIndex.show(4); 
    DSColAwithIndex.join(DSColBwithIndex).show(4); 
    DSColA.join(DSColB).show(4); 
    Dataset<Row> DSwithJoinofTwo=DSColAwithIndex.join(DSColBwithIndex, col("df1Key").equalTo(col("df2Key")), "inner"); 
    DSwithJoinofTwo.show(4); 
    Dataset<Row> DSwithJointrimmed=DSwithJoinofTwo.drop(DSwithJoinofTwo.apply("df1Key")).drop(DSwithJoinofTwo.apply("df2Key")); 
    DSwithJointrimmed.show(4); //JOINED DATASET FINALLY OF COLUMN A AND COLUMN B FROM SAME OR DIFF. DATASETS

來源

2017-03-03 15:13:57 Binu

用於添加列 – Binu

用於連接添加數據集通過向每個添加索引列並使用內部連接... withcolumn似乎工作，如果您嘗試創建一個數據集與其他列依賴列的數據集相同的數據集...像阿薩夫建議..我會探索和嘗試瞭解更好地使用列和限制..現在上面應該沒事我猜...謝謝... – Binu

使用withColumn（）將列添加到數據集<Row>給出問題...在運算符中缺少項目

回答

相關問題