我是Spark-Scala的新手。我正在嘗試清理一些數據。我在清理FIRSTNAME和LASTNAME列時遇到了問題。字符串中有數字。如何識別數字並用空字符替換整個字符串。如果一個數字存在於一個字符串中,請將該字符串替換爲null - Spark
Consider the following dataframe:
+---------+--------+
|FIRSTNAME|LASTNAME|
+---------+--------+
| Steve| 10 C|
| Mark| 9436|
| Brian| Lara|
+---------+--------+
How do I get this:
+---------+--------+
|FIRSTNAME|LASTNAME|
+---------+--------+
| Steve| null|
| Mark| null|
| Brian| Lara|
+---------+--------+
任何幫助將不勝感激。非常感謝你!
編輯:
scala> df2.withColumn("LASTNAME_TEMP", when(col("LASTNAME").contains("1"), null).otherwise(col("LASTNAME"))).show()
+---------+--------+-------------+
|FIRSTNAME|LASTNAME|LASTNAME_TEMP|
+---------+--------+-------------+
| Steve| 10 C| null|
| Mark| 9436| 9436|
| Brian| Lara| Lara|
+---------+--------+-------------+
但上面的代碼將只在一個字符串。我寧願它拿一個字符串列表。例如:
val numList = List("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")
我宣佈上述名單,並運行下面的代碼:
scala> df2.filter(col("LASTNAME").isin(numList:_*)).show()
我得到了以下數據框:
+---------+--------+
|FIRSTNAME|LASTNAME|
+---------+--------+
+---------+--------+
你到目前爲止嘗試過什麼?執行你寫的代碼時遇到了什麼樣的問題? – Dima