Python的數據幀：基於對字符串列和浮動列

下面我有以下Python數據幀值創建新列。「標誌」字段是我想用代碼創建的我想要的列。Python的數據幀：基於對字符串列和浮動列

我要做到以下幾點：

如果「分配類型」中預測，並且「Activities_Counter」大於10，我想創建一個名爲「舉報」新列，並貼上標籤以「國旗」

行否則，離開旗行空白。

我用下面的代碼來識別/標誌，其中「Activities_Counter」大於10 ...但我不知道如何把「分配類型」的標準到我的代碼。

Flag = [] 

for row in df_HA_noHA_act['Activities_Counter']: 
    if row >= 10: 
     Flag.append('Flag') 
    else: 
     Flag.append('') 

df_HA_noHA_act['Flag'] = Flag

任何幫助，非常感謝！

來源

2017-05-22 PineNuts0

您需要&添加新的條件。也快是使用numpy.where：

mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') & 
     (df_HA_noHA_act['Activities_Counter'] >= 10) 
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '')

df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18], 
           'Allocation Type':['Historical','Historical','Predicted', 
                'Predicted','Predicted','Historical']}) 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type 
0     10  Historical 
1     2  Historical 
2     6  Predicted 
3     15  Predicted 
4     11  Predicted 
5     18  Historical 

mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') & 
     (df_HA_noHA_act['Activities_Counter'] >= 10) 
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '') 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type Flag 
0     10  Historical  
1     2  Historical  
2     6  Predicted  
3     15  Predicted Flag 
4     11  Predicted Flag 
5     18  Historical

循環慢的解決方案：

Flag = [] 
for i, row in df_HA_noHA_act.iterrows(): 
    if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'): 
     Flag.append('Flag') 
    else: 
     Flag.append('') 
df_HA_noHA_act['Flag'] = Flag 
print (df_HA_noHA_act) 
    Activities_Counter Allocation Type Flag 
0     10  Historical  
1     2  Historical  
2     6  Predicted  
3     15  Predicted Flag 
4     11  Predicted Flag 
5     18  Historical

時序：

df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18], 
           'Allocation Type':['Historical','Historical','Predicted', 
                'Predicted','Predicted','Historical']}) 
print (df_HA_noHA_act) 
#[6000 rows x 2 columns] 
df_HA_noHA_act = pd.concat([df_HA_noHA_act]*1000).reset_index(drop=True) 

In [187]: %%timeit 
    ...: df_HA_noHA_act['Flag1'] = np.where((df_HA_noHA_act["Allocation Type"] == 'Predicted') & (df_HA_noHA_act['Activities_Counter'] >= 10), 'Flag', '') 
    ...: 
100 loops, best of 3: 1.89 ms per loop 

In [188]: %%timeit 
    ...: Flag = [] 
    ...: for i, row in df_HA_noHA_act.iterrows(): 
    ...:  if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'): 
    ...:   Flag.append('Flag') 
    ...:  else: 
    ...:   Flag.append('') 
    ...: df_HA_noHA_act['Flag'] = Flag 
    ...: 
    ...: 
1 loop, best of 3: 381 ms per loop

來源

2017-05-22 10:13:30 jezrael

完美地工作！非常感謝你:) – PineNuts0

是定時計算機科學的一個組成部分，你可以得到你的代碼運行得更快？ – PineNuts0

我認爲這是最快的解決方案，我在我的電腦測試。 – jezrael

Python的數據幀：基於對字符串列和浮動列

回答

相關問題