2017-08-15 76 views
1

我有一個列非Time對象的時間,我無法將它轉換爲timedelta或datetime。如何在python熊貓中投射時間列和查找帶條件的timedelta

 Time    msg 
12:29:36.306000  Setup 
12:29:36.507000  Alerting 
12:29:38.207000  Service 
12:29:39.194000  Setup 
12:30:05.773000  Alerting 
12:30:06.205000  Service 
12:32:07.315000  Setup 
12:32:17.194000  Service 
12:32:26.889000  Setup 
12:36:06.274000  Alerting 
12:36:08.523000  Service 
12:37:59.200000  Setup 
12:47:10.652000  Alerting 
12:47:43.921000  Setup 

當我鍵入df.info(),我得到了一個「時間」欄爲非null對象,我無法將其轉換爲timedelta或日期時間(這個我爲什麼不能這樣做很明顯它)。那麼,什麼是解決方案找到連續味精(時間增量)之間的差異,但如果是5秒的時間比傳遞。 輸出:

 Time    msg   diff 
12:29:36.306000  Setup   
12:29:36.507000  Alerting  
12:29:38.207000  Service 
12:29:39.194000  Setup 
12:30:05.773000  Alerting 
12:30:06.205000  Service 
12:32:07.315000  Setup 
12:32:17.194000  Service 
12:32:26.889000  Setup 
12:36:06.274000  Alerting 6.30*** 
12:36:08.523000  Service  
12:37:59.200000  Setup 
12:47:10.652000  Alerting 11.02***  
12:47:43.921000  Setup  

我已經嘗試過的東西像這樣:

df['diff'] = (df['Time']df['Time'].shift()).fillna(0) 

但我不知道寫了5秒間隔條件。

+0

如果使用'DF [ '時間'] = pd.to_timedelta(DF [ '時代'])'它返回錯誤? – jezrael

+0

是的。 ValueError:timedelta標量的無效類型: jovicbg

+0

然後使用'df ['Time'] = pd.to_timedelta(df ['Time']。astype(str))' – jezrael

回答

1

我認爲首先需要轉換爲str,然後撥打to_timedelta

然後獲得diff並與5s昏迷。

末新列使用mask通過面膜:

df['Time'] = pd.to_timedelta(df['Time'].astype(str)) 

df['diff'] = df['Time'].diff() 
df['mask'] = df['Time'].diff() > pd.Timedelta(5, unit='s') 
print (df) 
       Time  msg   diff mask 
0 12:29:36.306000  Setup    NaT False 
1 12:29:36.507000 Alerting 00:00:00.201000 False 
2 12:29:38.207000 Service 00:00:01.700000 False 
3 12:29:39.194000  Setup 00:00:00.987000 False 
4 12:30:05.773000 Alerting 00:00:26.579000 True 
5 12:30:06.205000 Service 00:00:00.432000 False 
6 12:32:07.315000  Setup 00:02:01.110000 True 
7 12:32:17.194000 Service 00:00:09.879000 True 
8 12:32:26.889000  Setup 00:00:09.695000 True 
9 12:36:06.274000 Alerting 00:03:39.385000 True 
10 12:36:08.523000 Service 00:00:02.249000 False 
11 12:37:59.200000  Setup 00:01:50.677000 True 
12 12:47:10.652000 Alerting 00:09:11.452000 True 
13 12:47:43.921000  Setup 00:00:33.269000 True 

df['Time'] = pd.to_timedelta(df['Time']) 
diff = df['Time'].diff() 
mask = df['Time'].diff() > pd.Timedelta(5, unit='s') 
df['new'] = diff.where(mask) 
print (df) 
       Time  msg    new 
0 12:29:36.306000  Setup    NaT 
1 12:29:36.507000 Alerting    NaT 
2 12:29:38.207000 Service    NaT 
3 12:29:39.194000  Setup    NaT 
4 12:30:05.773000 Alerting 00:00:26.579000 
5 12:30:06.205000 Service    NaT 
6 12:32:07.315000  Setup 00:02:01.110000 
7 12:32:17.194000 Service 00:00:09.879000 
8 12:32:26.889000  Setup 00:00:09.695000 
9 12:36:06.274000 Alerting 00:03:39.385000 
10 12:36:08.523000 Service    NaT 
11 12:37:59.200000  Setup 00:01:50.677000 
12 12:47:10.652000 Alerting 00:09:11.452000 
13 12:47:43.921000  Setup 00:00:33.269000 
+0

這取決於問題 - 也許是的,也許沒有必要創造新的話題。 – jezrael

+0

如果你不介意,我還有一個問題。我可以創建一個新的Q主題。 什麼如果我必須使用這5秒的條件,只要我有兩個'安裝'值味精沒有'警告'值之間。如果Alerting在msg中的兩個Setup值之間,那麼只需計算timedelta,因爲這是正常的。 例如,第7行和第8行在你的代碼中是NaT,但是其他的會有timedelta。 – jovicbg

+0

這似乎不是那麼容易,也許你可以創造新的問題。爲什麼不選擇'6,7,8'行? – jezrael