我有一些表格,我需要掩蓋它的一些列。要屏蔽的列因表而不同,我正在閱讀application.conf
文件中的那些列。如何使用Spark 2遮罩列?
例如,對於員工表如下圖所示
+----+------+-----+---------+
| id | name | age | address |
+----+------+-----+---------+
| 1 | abcd | 21 | India |
+----+------+-----+---------+
| 2 | qazx | 42 | Germany |
+----+------+-----+---------+
如果我們想掩蓋姓名和年齡列然後我的序列中獲得這些列。
val mask = Seq("name", "age")
屏蔽之後的預期值是:
+----+----------------+----------------+---------+
| id | name | age | address |
+----+----------------+----------------+---------+
| 1 | *** Masked *** | *** Masked *** | India |
+----+----------------+----------------+---------+
| 2 | *** Masked *** | *** Masked *** | Germany |
+----+----------------+----------------+---------+
如果我有職員表的數據幀,那麼什麼是掩蓋這些列的方式嗎?
如果我有payment
表如下圖所示,要屏蔽name
和salary
列然後我得到面具列順序
+----+------+--------+----------+
| id | name | salary | tax_code |
+----+------+--------+----------+
| 1 | abcd | 12345 | KT10 |
+----+------+--------+----------+
| 2 | qazx | 98765 | AD12d |
+----+------+--------+----------+
val mask = Seq("name", "salary")
我想是這樣的mask.foreach(c => base.withColumn(c, regexp_replace(col(c), "^.*?$", "*** Masked ***")))
但它並沒有返回任何東西。
感謝@philantrovert,我找到了解決方案。這裏是我使用的解決方案:
def maskData(base: DataFrame, maskColumns: Seq[String]) = {
val maskExpr = base.columns.map { col => if(maskColumns.contains(col)) s"'*** Masked ***' as ${col}" else col }
base.selectExpr(maskExpr: _*)
}
謝謝。有效 – Shekhar