2013-02-25 134 views
3

我想用布爾值創建一個DataFrame,其中np.nan == False和任何正實數值== True。返回布爾值DataFrame

import numpy as np 
import pandas as pd 
DF = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]}) 

DF.apply(bool) # Does not work 
DF.where(DF.isnull() == False) # Does not work 
DF[DF.isnull() == False] # Does not work 

回答

2

怪異,但它看起來像- np.isnan(df)以壓倒性的優勢勝過pd.notnull(df)

In [1]: import pandas as pd 

In [2]: import numpy as np 

In [3]: df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]}) 


In [4]: - np.isnan(df) 
Out[4]: 
     a  b 
0 True False 
1 True False 
2 True False 
3 True True 
4 False False 

In [5]: %timeit - np.isnan(df) 
10000 loops, best of 3: 159 us per loop 

In [6]: %timeit pd.notnull(df) 
1000 loops, best of 3: 1.22 ms per loop 
2

有不isnull一個方便的功能,稱爲notnull

In [11]: pd.notnull(df) 
Out[11]: 
     a  b 
0 True False 
1 True False 
2 True False 
3 True True 
4 False False 
+1

+1注意到了'notnull'。但是,'np.isnan(df)'似乎快了8倍:S – root 2013-02-25 14:37:15

+0

@root有趣!我懷疑這是部分/主要是因爲'notnull'比'float'支持更多'dtypes'? – 2013-02-25 14:42:36

0

比較NOTNULL()和isnan()在某些格式錯誤的df上:

df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan],'c':['fish','bear','cat','dog',np.nan]}) 

%%timeit 
legit_dexes = np.isnan(df[df<=""].astype(float)) == False 

1000個循環,最好的3:632我們每個環路

%%timeit 
legit_dexes = pd.notnull(df) 

1000個循環,最好的3:751我們每個環路

這種變化,無視畸形列也類似:

%%timeit 
legit_dexes = np.isnan(df[df.columns[df.apply(lambda x: not np.any(x.values>=""))]]) == False 

1000次循環,最好的3:681我們每個環路