無法從數據幀

所以淘汰的NaN行，我試圖清理含有一些楠數據幀值無法從數據幀

我嘗試了所有的建議的方法，但好像我無法擺脫的NaN的。

df = pd.read_csv('filename.tsv', delimiter='\t') 
df = df[pd.notnull(df)] 
df = df.dropna() 

df[pd.isnull(df)] 
# gives our records containing NaN (alot of them.)

我不知道我在想什麼？

編輯：的一個給人NaN的具有所有列的NaN的

一些更多的編輯：當我嘗試看看類型

heads = df[df.isnull()].head() 
for idx, row in heads.iterrows(): 
    print idx, type(row.listener_id)

這回

0 <type 'float'> 
1 <type 'float'> 
2 <type 'float'> 
3 <type 'float'> 
4 <type 'float'>

來源

2017-09-05 Fraz

也許'NaN'是字符串，那麼需要'df.replace（'NaN'，np.nan）' – jezrael

你可以添加數據樣本嗎？ 3,4行？ – jezrael

或者需要在read_csv中定義自定義的'Na'值 - [docs]（http://pandas.pydata.org/pandas-docs/stable/io.html#na-values） – jezrael

我認爲如果需要使用布爾索引：

df = df[~df.isnull().any(axis=1)]

但更好的是隻使用：

df = df.dropna()

樣品：

df = pd.DataFrame({'A':[np.nan,5,4,5,5,np.nan], 
        'B':[7,8,9,4,2,np.nan], 
        'C':[1,3,5,7,1,np.nan], 
        'D':[5,3,6,9,2,np.nan]}) 

print (df) 
    A B C D 
0 NaN 7.0 1.0 5.0 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0 
5 NaN NaN NaN NaN

#get True for NaN 
print (df.isnull()) 
     A  B  C  D 
0 True False False False 
1 False False False False 
2 False False False False 
3 False False False False 
4 False False False False 
5 True True True True 

#check at least one True per row 
print (df.isnull().any(axis=1)) 
0  True 
1 False 
2 False 
3 False 
4 False 
5  True 
dtype: bool 

#boolen indexing with inverting `~` (need select NO NaN rows) 
print (df[~df.isnull().any(axis=1)]) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

#get True for not NaN 
print (df.notnull()) 
     A  B  C  D 
0 False True True True 
1 True True True True 
2 True True True True 
3 True True True True 
4 True True True True 
5 False False False False 

#get True if all values per row are True 
print (df.notnull().all(axis=1)) 
0 False 
1  True 
2  True 
3  True 
4  True 
5 False 
dtype: bool 

#boolean indexing 
print (df[df.notnull().all(axis=1)]) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

#simpliest solution 
print (df.dropna()) 
    A B C D 
1 5.0 8.0 3.0 3.0 
2 4.0 9.0 5.0 6.0 
3 5.0 4.0 7.0 9.0 
4 5.0 2.0 1.0 2.0

來源

2017-09-05 07:07:47 jezrael

是的..這是它..你可以詳細說明這個.. – Fraz

是的，我創建數據示例。給我一些時間 – jezrael

最後，下面兩個去掉NaN 'df = df [df.isnull（）。any（axis = 1）]; '012f'df = df.dropna（）;' 'df [df.isnull（）]。head（）' 返回一個空的數據框，從而除去NaN值 – Fraz

無法從數據幀

回答

相關問題