從包含問號的數據框中刪除所有行（？）

我有一個Pandas DataFrame，其中某些值缺失（用?表示）。有沒有簡單的方法刪除所有行中至少有一列的值爲?？從包含問號的數據框中刪除所有行（？）

通常，我會做布爾索引，但我有很多列。一種方法是如下：

for index, row in df.iterrows(): 
    for col in df.columns: 
     if '?' in row[col]: 
      #delete row

但這似乎unPythonic ...

任何想法？

來源

2017-09-17 bclayman

方案1A
boolean indexing和any

df 
    col1 col2 col3 col4 
row1 65 24 47 ? 
row2 33 48 ? 89 
row3 ? 34 67 ? 
row4 24 12 52 17 

(df.astype(str) == '?').any(1) 
row1  True 
row2  True 
row3  True 
row4 False 
dtype: bool 

df = df[~(df.astype(str) == '?').any(1)] 
df 
    col1 col2 col3 col4 
row4 24 12 52 17

這裏，astype(str)檢查是爲了防止TypeError: Could not compare ['?'] with block values的，如果你在你的數據框有字符串和數字列的混合物被提出。

可能性1b 與values

(df.values == '?').any(1) 
array([ True, True, True, False], dtype=bool) 

df = df[~(df.values == '?').any(1)] 
df 
    col1 col2 col3 col4 
row4 24 12 52 17

選項2
df.replacedf.notnull和

df.replace('?', np.nan).notnull().all(1) 
row1 False 
row2 False 
row3 False 
row4  True 
dtype: bool 

df = df[df.replace('?', np.nan).notnull().all(1)] 
    col1 col2 col3 col4 
row4 24 12 52 17

哪個避免直接比較撥打astype(str)。或者，你可能會做如溫家寶建議，只是把它們：

df.replace('?', np.nan).dropna()

來源

2017-09-17 23:54:19

或者只是replace它楠使用dropna

df.replace({'?':np.nan}).dropna() 
Out[126]: 
    col1 col2 col3 col4 
row4 24 12 52 17

來源

2017-09-18 01:57:53 Wen

您可以使用boolean indexing與all進行檢查，如果值不包含?

如果混合類型 - 數字與int s：

df = pd.DataFrame({'B':[4,5,'?',5,5,4], 
        'C':[7,'?',9,4,2,3], 
        'D':[1,3,5,7,'?',0], 
        'E':[5,3,'?',9,2,4]}) 

print (df) 
    B C D E 
0 4 7 1 5 
1 5 ? 3 3 
2 ? 9 5 ? 
3 5 4 7 9 
4 5 2 ? 2 
5 4 3 0 4 

df = df[(df.astype(str) != '?').all(axis=1)].astype(int) 
print (df) 
    B C D E 
0 4 7 1 5 
3 5 4 7 9 
5 4 3 0 4

或者與由values創建numpy的陣列比較：

df = df[(df.values != '?').all(axis=1)] 
print (df) 
    B C D E 
0 4 7 1 5 
3 5 4 7 9 
5 4 3 0 4

如果所有值都是字符串溶液可以簡化：

df = pd.DataFrame({'B':[4,5,'?',5,5,4], 
        'C':[7,'?',9,4,2,3], 
        'D':[1,3,5,7,'?',0], 
        'E':[5,3,'?',9,2,4]}).astype(str) 


df = df[(df != '?').all(axis=1)].astype(int) 
print (df) 
    B C D E 
0 4 7 1 5 
3 5 4 7 9 
5 4 3 0 4

來源

2017-09-18 05:39:36 jezrael

從包含問號的數據框中刪除所有行（？）

回答

相關問題