如何在python熊貓中投射時間列和查找帶條件的timedelta

我有一個列非Time對象的時間，我無法將它轉換爲timedelta或datetime。如何在python熊貓中投射時間列和查找帶條件的timedelta

 Time    msg 
12:29:36.306000  Setup 
12:29:36.507000  Alerting 
12:29:38.207000  Service 
12:29:39.194000  Setup 
12:30:05.773000  Alerting 
12:30:06.205000  Service 
12:32:07.315000  Setup 
12:32:17.194000  Service 
12:32:26.889000  Setup 
12:36:06.274000  Alerting 
12:36:08.523000  Service 
12:37:59.200000  Setup 
12:47:10.652000  Alerting 
12:47:43.921000  Setup

當我鍵入df.info（），我得到了一個「時間」欄爲非null對象，我無法將其轉換爲timedelta或日期時間（這個我爲什麼不能這樣做很明顯它）。那麼，什麼是解決方案找到連續味精（時間增量）之間的差異，但如果是5秒的時間比傳遞。輸出：

 Time    msg   diff 
12:29:36.306000  Setup   
12:29:36.507000  Alerting  
12:29:38.207000  Service 
12:29:39.194000  Setup 
12:30:05.773000  Alerting 
12:30:06.205000  Service 
12:32:07.315000  Setup 
12:32:17.194000  Service 
12:32:26.889000  Setup 
12:36:06.274000  Alerting 6.30*** 
12:36:08.523000  Service  
12:37:59.200000  Setup 
12:47:10.652000  Alerting 11.02***  
12:47:43.921000  Setup

我已經嘗試過的東西像這樣：

df['diff'] = (df['Time']df['Time'].shift()).fillna(0)

但我不知道寫了5秒間隔條件。

來源

2017-08-15 jovicbg

如果使用'DF [ '時間'] = pd.to_timedelta（DF [ '時代']）'它返回錯誤？ – jezrael

是的。 ValueError：timedelta標量的無效類型： – jovicbg

然後使用'df ['Time'] = pd.to_timedelta（df ['Time']。astype（str））' – jezrael

我認爲首先需要轉換爲str，然後撥打to_timedelta。

然後獲得diff並與5s昏迷。

末新列使用mask通過面膜：

df['Time'] = pd.to_timedelta(df['Time'].astype(str)) 

df['diff'] = df['Time'].diff() 
df['mask'] = df['Time'].diff() > pd.Timedelta(5, unit='s') 
print (df) 
       Time  msg   diff mask 
0 12:29:36.306000  Setup    NaT False 
1 12:29:36.507000 Alerting 00:00:00.201000 False 
2 12:29:38.207000 Service 00:00:01.700000 False 
3 12:29:39.194000  Setup 00:00:00.987000 False 
4 12:30:05.773000 Alerting 00:00:26.579000 True 
5 12:30:06.205000 Service 00:00:00.432000 False 
6 12:32:07.315000  Setup 00:02:01.110000 True 
7 12:32:17.194000 Service 00:00:09.879000 True 
8 12:32:26.889000  Setup 00:00:09.695000 True 
9 12:36:06.274000 Alerting 00:03:39.385000 True 
10 12:36:08.523000 Service 00:00:02.249000 False 
11 12:37:59.200000  Setup 00:01:50.677000 True 
12 12:47:10.652000 Alerting 00:09:11.452000 True 
13 12:47:43.921000  Setup 00:00:33.269000 True

df['Time'] = pd.to_timedelta(df['Time']) 
diff = df['Time'].diff() 
mask = df['Time'].diff() > pd.Timedelta(5, unit='s') 
df['new'] = diff.where(mask) 
print (df) 
       Time  msg    new 
0 12:29:36.306000  Setup    NaT 
1 12:29:36.507000 Alerting    NaT 
2 12:29:38.207000 Service    NaT 
3 12:29:39.194000  Setup    NaT 
4 12:30:05.773000 Alerting 00:00:26.579000 
5 12:30:06.205000 Service    NaT 
6 12:32:07.315000  Setup 00:02:01.110000 
7 12:32:17.194000 Service 00:00:09.879000 
8 12:32:26.889000  Setup 00:00:09.695000 
9 12:36:06.274000 Alerting 00:03:39.385000 
10 12:36:08.523000 Service    NaT 
11 12:37:59.200000  Setup 00:01:50.677000 
12 12:47:10.652000 Alerting 00:09:11.452000 
13 12:47:43.921000  Setup 00:00:33.269000

來源

2017-08-15 14:00:31 jezrael

這取決於問題 - 也許是的，也許沒有必要創造新的話題。 – jezrael

如果你不介意，我還有一個問題。我可以創建一個新的Q主題。什麼如果我必須使用這5秒的條件，只要我有兩個'安裝'值味精沒有'警告'值之間。如果Alerting在msg中的兩個Setup值之間，那麼只需計算timedelta，因爲這是正常的。例如，第7行和第8行在你的代碼中是NaT，但是其他的會有timedelta。 – jovicbg

這似乎不是那麼容易，也許你可以創造新的問題。爲什麼不選擇'6,7,8'行？ – jezrael

如何在python熊貓中投射時間列和查找帶條件的timedelta

回答

相關問題