2017-02-21 51 views
1

我有一個名爲ES_15M_Summary數據幀,具有係數/貝塔在標題爲ES_15M_Summary柱[「Rolling_OLS_Coefficient」]如下:的Python DataFrames For循環與如果聲明不工作

Column 'Rolling_OLS_Coefficient'

如果上圖中柱('Rolling_OLS_Coefficient')是一個大於.08的值,我想要一個名爲'Long'的新列是一個二進制'Y'。如果其他列中的值小於.08,我希望該值爲'NaN'或'N'(或者有效)。

因此,我正在寫一個for循環來運行列。首先,我創建了名爲 '龍' 的新列,並將其設置爲NaN:

ES_15M_Summary['Long'] = np.nan 

然後我做了如下For循環:

for index, row in ES_15M_Summary.iterrows(): 
    if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08: 
     ES_15M_Summary['Long'] = 'Y' 
    else: 
     ES_15M_Summary['Long'] = 'NaN' 

我得到的錯誤:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

...指的是上面顯示的if語句行(如果...>。08 :)。我不知道爲什麼我得到這個錯誤或for循環出了什麼問題。任何幫助表示讚賞。

回答

2

我覺得更好的是使用numpy.where

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08 
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N') 

樣品:

ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]}) 
print (ES_15M_Summary) 
    Rolling_OLS_Coefficient 
0      0.07 
1      0.01 
2      0.09 

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08 
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N') 
print (ES_15M_Summary) 
    Rolling_OLS_Coefficient Long 
0      0.07 N 
1      0.01 N 
2      0.09 Y 

循環,很慢的解決方案:

for index, row in ES_15M_Summary.iterrows(): 
    if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08: 
     ES_15M_Summary.loc[index,'Long'] = 'Y' 
    else: 
     ES_15M_Summary.loc[index,'Long'] = 'N' 
print (ES_15M_Summary) 
    Rolling_OLS_Coefficient Long 
0      0.07 N 
1      0.01 N 
2      0.09 Y 

時序

#3000 rows 
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000}) 
#print (ES_15M_Summary) 


def loop(df): 
    for index, row in ES_15M_Summary.iterrows(): 
     if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08: 
      ES_15M_Summary.loc[index,'Long'] = 'Y' 
     else: 
      ES_15M_Summary.loc[index,'Long'] = 'N' 
    return (ES_15M_Summary) 

print (loop(ES_15M_Summary)) 


In [51]: %timeit (loop(ES_15M_Summary)) 
1 loop, best of 3: 2.38 s per loop 

In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N') 
1000 loops, best of 3: 555 µs per loop 
+0

謝謝,我正在使用您提供的for循環。非常感激。 –