對於熊貓我有一個代碼段是這樣的:火花條件替換值的
def setUnknownCatValueConditional(df, conditionCol, condition, colToSet, _valueToSet='KEINE'):
df.loc[(df[conditionCol] == condition) & (df[colToSet].isnull()), colToSet] = _valueToSet
其中有條件將在數據幀替換值。
試圖端口此功能引發
df.withColumn("A", when($"A" === "x" and $"B" isNull, "replacement")).show
並沒有爲我工作了
df.withColumn("A", when($"A" === "x" and $"B" isNull, "replacement")).show
warning: there was one feature warning; re-run with -feature for details
org.apache.spark.sql.AnalysisException: cannot resolve '((`A` = 'x') AND `B`)' due to data type mismatch: differing types in '((`A` = 'X') AND `B`)' (boolean and string).;;
即使df.printSchema返回一個字符串和b
什麼是錯的這裏?
編輯的最小例如:
import java.sql.{ Date, Timestamp }
case class FooBar(foo:Date, bar:String)
val myDf = Seq(("2016-01-01","first"),("2016-01-02","second"),("2016-wrongFormat","noValidFormat"), ("2016-01-04","lastAssumingSameDate"))
.toDF("foo","bar")
.withColumn("foo", 'foo.cast("Date"))
.as[FooBar]
myDf.printSchema
root
|-- foo: date (nullable = true)
|-- bar: string (nullable = true)
scala> myDf.show
+----------+--------------------+
| foo| bar|
+----------+--------------------+
|2016-01-01| first|
|2016-01-02| second|
| null| noValidFormat|
|2016-01-04|lastAssumingSameDate|
+----------+--------------------+
myDf.withColumn("foo", when($"bar" === "noValidFormat" and $"foo" isNull, "noValue")).show
而在條件情況下鏈接的預期輸出
+----------+--------------------+
| foo| bar|
+----------+--------------------+
|2016-01-01| first|
|2016-01-02| second|
| "noValue"| noValidFormat|
|2016-01-04|lastAssumingSameDate|
+----------+--------------------+
EDIT2
需要
df
.withColumn("A",
when(
(($"B" === "x") and ($"B" isNull)) or
(($"B" === "y") and ($"B" isNull)), "replacement")
應該工作
請分享示例數據和預期輸出 – mtoto
@mtoto請參閱編輯。 –