2017-05-22 137 views
1

下面我有以下Python數據幀值創建新列。 「標誌」字段是我想用代碼創建的我想要的列。Python的數據幀:基於對字符串列和浮動列

enter image description here

我要做到以下幾點:

如果「分配類型」中預測,並且「Activities_Counter」大於10,我想創建一個名爲「舉報」新列,並貼上標籤以「國旗」

行否則,離開旗行空白。

我用下面的代碼來識別/標誌,其中「Activities_Counter」大於10 ...但我不知道如何把「分配類型」的標準到我的代碼。

Flag = [] 

for row in df_HA_noHA_act['Activities_Counter']: 
    if row >= 10: 
     Flag.append('Flag') 
    else: 
     Flag.append('') 

df_HA_noHA_act['Flag'] = Flag 

任何幫助,非常感謝!

回答

2

您需要&添加新的條件。也快是使用numpy.where

mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') & 
     (df_HA_noHA_act['Activities_Counter'] >= 10) 
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '') 

df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18], 
           'Allocation Type':['Historical','Historical','Predicted', 
                'Predicted','Predicted','Historical']}) 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type 
0     10  Historical 
1     2  Historical 
2     6  Predicted 
3     15  Predicted 
4     11  Predicted 
5     18  Historical 

mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') & 
     (df_HA_noHA_act['Activities_Counter'] >= 10) 
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '') 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type Flag 
0     10  Historical  
1     2  Historical  
2     6  Predicted  
3     15  Predicted Flag 
4     11  Predicted Flag 
5     18  Historical  

循環慢的解決方案:

Flag = [] 
for i, row in df_HA_noHA_act.iterrows(): 
    if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'): 
     Flag.append('Flag') 
    else: 
     Flag.append('') 
df_HA_noHA_act['Flag'] = Flag 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type Flag 
0     10  Historical  
1     2  Historical  
2     6  Predicted  
3     15  Predicted Flag 
4     11  Predicted Flag 
5     18  Historical  

時序

df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18], 
           'Allocation Type':['Historical','Historical','Predicted', 
                'Predicted','Predicted','Historical']}) 
print (df_HA_noHA_act) 
#[6000 rows x 2 columns] 
df_HA_noHA_act = pd.concat([df_HA_noHA_act]*1000).reset_index(drop=True) 

In [187]: %%timeit 
    ...: df_HA_noHA_act['Flag1'] = np.where((df_HA_noHA_act["Allocation Type"] == 'Predicted') & (df_HA_noHA_act['Activities_Counter'] >= 10), 'Flag', '') 
    ...: 
100 loops, best of 3: 1.89 ms per loop 

In [188]: %%timeit 
    ...: Flag = [] 
    ...: for i, row in df_HA_noHA_act.iterrows(): 
    ...:  if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'): 
    ...:   Flag.append('Flag') 
    ...:  else: 
    ...:   Flag.append('') 
    ...: df_HA_noHA_act['Flag'] = Flag 
    ...: 
    ...: 
1 loop, best of 3: 381 ms per loop 
+0

完美地工作!非常感謝你:) – PineNuts0

+0

是定時計算機科學的一個組成部分,你可以得到你的代碼運行得更快? – PineNuts0

+0

我認爲這是最快的解決方案,我在我的電腦測試。 – jezrael