PySpark數據框 - 如何通過字符串變量df.where（）條件

我不知道這是可能的pyspark。我相信這應該只是，我沒有在這裏獲勝:(PySpark數據框 - 如何通過字符串變量df.where（）條件

要求：把其FNAME和LNAME爲空或0

預期結果的任何記錄。前兩行，結果

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load(fileName) 
df.show() 

+------+-------+------+ 
| FNAME| LNAME| CITY| 
+------+-------+------+ 
|  0| null| NY| 
| null|  0| null| 
| Joe| null| LA| 
| null| Deon| SA| 
| Steve| Mark| null| 
+------+-------+------+ 

colCondition = [] 
for col in df.columns: 
    condition = '(df.'+col+'.isNull() | df.'+col+' == 0)' 
    colCondition.append(condition) 

dfWhereConditon = ' & '.join(colList)

這就是我想要達到的目標：

df.where(dfWhereConditon)

這並不因爲dfWhereCondition工作被視爲〜應變克里面的條件。我該如何解決這個問題，或者是否有更好的方法來實現這一點。

感謝

來源

2017-08-22 just10minutes

如果你想使用字符串的條件，你可以使用SQL篩選子句：

condition = ' AND '.join(['('+ col + ' IS NULL OR ' + col + ' = 0)' for col in df.columns]) 
df.filter(condition)

來源

2017-08-22 11:11:00 MaFF

太好了，非常感謝。這個伎倆 – just10minutes

PySpark數據框 - 如何通過字符串變量df.where（）條件

回答

相關問題