2017-06-23 61 views
0

我的數據框看起來像下面Pyspark從列在數據幀中刪除空值

ID,FirstName,LastName 

1,Navee,Srikanth 

2,,Srikanth 

3,Naveen, 

現在我的問題陳述我不得不刪除的行號2,因爲名字爲空。

我使用下面pyspark腳本

join_Df1= Name.filter(Name.col(FirstName).isnotnull()).show() 

我得到錯誤的

File "D:\0\NameValidation.py", line 13, in <module> 
join_Df1= filter(Name.FirstName.isnotnull()).show() 

TypeError: 'Column' object is not callable

任何人都可以請幫我在這解決

+0

退房答案https://stackoverflow.com/questions/37262762/filter-pyspark-dataframe-column-with-none-value –

+0

[過濾器的可能的複製Pyspark數據框列無值](https://stackoverflow.com/questions/37262762/filter-pyspark-dataframe-column-with-none-value) –

回答

0

我想你可能什麼需要的是這個notnull()

因此,這是你的csv文件my_test.csv輸入:

ID,FirstName,LastName 
1,Navee,Srikanth 

2,,Srikanth 

3,Naveen 

代碼:

import pandas as pd 
df = pd.read_csv("my_test.csv") 

print(df[df['FirstName'].notnull()]) 

輸出:

ID FirstName LastName 
0 1  Navee Srikanth 
2 3 Naveen  NaN 

這是你想要的! df[df['FirstName'].notnull()]

df['FirstName'].notnull()輸出:

0  True 
1 False 
2  True 

這將創建一個數據幀df其中df['FirstName'].notnull()回報True

該如何檢查? df['FirstName'].notnull()如果FirstName列的值爲空,則返回True否則如果存在NaN返回False

2

你應該做如下

join_Df1.filter(join_Df1.FirstName.isNotNull()).show 

希望這有助於!

4

它看起來像你的DataFrame名字有空值而不是Null。下面是一些選項來嘗試: -

df = sqlContext.createDataFrame([[1,'Navee','Srikanth'], [2,'','Srikanth'] , [3,'Naveen','']], ['ID','FirstName','LastName']) 
df.show() 
+---+---------+--------+ 
| ID|FirstName|LastName| 
+---+---------+--------+ 
| 1| Navee|Srikanth| 
| 2|   |Srikanth| 
| 3| Naveen|  | 
+---+---------+--------+ 

df.where(df.FirstName.isNotNull()).show() #This doen't remove null because df have empty value 
+---+---------+--------+ 
| ID|FirstName|LastName| 
+---+---------+--------+ 
| 1| Navee|Srikanth| 
| 2|   |Srikanth| 
| 3| Naveen|  | 
+---+---------+--------+ 

df.where(df.FirstName != '').show() 
+---+---------+--------+ 
| ID|FirstName|LastName| 
+---+---------+--------+ 
| 1| Navee|Srikanth| 
| 3| Naveen|  | 
+---+---------+--------+ 

df.filter(df.FirstName != '').show() 
+---+---------+--------+ 
| ID|FirstName|LastName| 
+---+---------+--------+ 
| 1| Navee|Srikanth| 
| 3| Naveen|  | 
+---+---------+--------+ 

df.where("FirstName != ''").show() 
+---+---------+--------+ 
| ID|FirstName|LastName| 
+---+---------+--------+ 
| 1| Navee|Srikanth| 
| 3| Naveen|  | 
+---+---------+--------+ 
+0

完美Rakesh,它的工作。說得好 。 Thanx aton –