我正在改變一些申請人的交易數據,我需要創建一個新的標誌列(在我的例子中標記爲「DESIRED FLAG」)。但是,我無法弄清楚正確的循環/應用方法,因爲在下面的邏輯中可能有很多不同的變化。這種情況下最好的熊貓應用/循環方法是什麼?
在一個完美的世界裏,連續申請過程中的歷史是這樣的,所有的「狀態」的設置爲「已完成」:
- 現場採訪開球 - >安排面試 - >決策; OR
- 電話採訪開球 - >安排面試 - >決策
當然,申請人可以順利通過很多電話面試和站點在他們的申請過程。
如下面的例子所示,有時會有「Schedule Interviews」被取消。在這些情況下,我需要刪除該步驟以及與此相關的後續步驟。其中包括「時間表訪談」,「決定」和「現場訪談開始」或「電話採訪開始」。另外,有時還會有其他「事件」,就像我們看到的手動跳過的那樣。
我還有其他類型的,我需要爲標誌的情況,所以我需要保持原有的數據框只新列。
import pandas as pd
data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))
df
如果您可以發佈所需輸出的樣子,這將非常有幫助。 – pshep123
請參閱'DESIRED FLAG'列。這就是輸出結果的樣子。謝謝! – Christopher
明白了。有助於以數據框的形式呈現,但也許這只是我。 – pshep123