返回布爾值DataFrame

我想用布爾值創建一個DataFrame，其中np.nan == False和任何正實數值== True。返回布爾值DataFrame

import numpy as np 
import pandas as pd 
DF = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]}) 

DF.apply(bool) # Does not work 
DF.where(DF.isnull() == False) # Does not work 
DF[DF.isnull() == False] # Does not work

來源

2013-02-25 dmvianna

怪異，但它看起來像- np.isnan(df)以壓倒性的優勢勝過pd.notnull(df)：

In [1]: import pandas as pd 

In [2]: import numpy as np 

In [3]: df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan]}) 


In [4]: - np.isnan(df) 
Out[4]: 
     a  b 
0 True False 
1 True False 
2 True False 
3 True True 
4 False False 

In [5]: %timeit - np.isnan(df) 
10000 loops, best of 3: 159 us per loop 

In [6]: %timeit pd.notnull(df) 
1000 loops, best of 3: 1.22 ms per loop

來源

2013-02-25 06:55:52 root

有不isnull一個方便的功能，稱爲notnull：

In [11]: pd.notnull(df) 
Out[11]: 
     a  b 
0 True False 
1 True False 
2 True False 
3 True True 
4 False False

來源

2013-02-25 10:29:33

+1注意到了'notnull'。但是，'np.isnan（df）'似乎快了8倍：S – root 2013-02-25 14:37:15

@root有趣！我懷疑這是部分/主要是因爲'notnull'比'float'支持更多'dtypes'？ – 2013-02-25 14:42:36

比較NOTNULL（）和isnan（）在某些格式錯誤的df上：

個

df = pd.DataFrame({'a':[1,2,3,4,np.nan],'b':[np.nan,np.nan,np.nan,5,np.nan],'c':['fish','bear','cat','dog',np.nan]}) 

%%timeit 
legit_dexes = np.isnan(df[df<=""].astype(float)) == False

1000個循環，最好的3：632我們每個環路

%%timeit 
legit_dexes = pd.notnull(df)

1000個循環，最好的3：751我們每個環路

這種變化，無視畸形列也類似：

%%timeit 
legit_dexes = np.isnan(df[df.columns[df.apply(lambda x: not np.any(x.values>=""))]]) == False

1000次循環，最好的3：681我們每個環路

來源

2013-02-26 18:32:48 radikalus

返回布爾值DataFrame

回答

相關問題