2016-05-31 126 views
1

我想保留最後幾行,但是一旦超過100毫秒時間間隔,就切斷數據幀的其餘部分。例如:熊貓 - 在時間間隔上截斷數據幀

輸入:

  Time X 
0 12:30:00.00 A 
1 12:30:00.100 B 
2 12:30:00.202 C 
3 12:30.00.300 D 

輸出

  Time X 
2 12:30:00.202 C 
3 12:30.00.300 D 

說明:有100ms以上行B和C之間,所以我們扔掉一切上述行C.

+0

你的預期行爲是什麼?數據中有多個100ms的空隙?把最後一個小組帶過去? –

+0

不,第一次截斷時間爲100ms,第一次是從頭到尾查找(頂部)。 –

回答

2

你可以使用diffTimedelta比較to_timedelta,然後cumsum比較1。最後使用boolean indexing

df['Time']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f') 

print (df) 
        Time X 
0 1900-01-01 12:30:00.000 A 
1 1900-01-01 12:30:00.100 B 
2 1900-01-01 12:30:00.202 C 
3 1900-01-01 12:30:00.300 D 

print (df.Time.diff()) 
0    NaT 
1 00:00:00.100000 
2 00:00:00.102000 
3 00:00:00.098000 
Name: Time, dtype: timedelta64[ns] 

mask = (((df.Time.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1) 
print (mask) 
0 False 
1 False 
2  True 
3  True 
Name: Time, dtype: bool 

print (df[mask]) 
        Time X 
2 1900-01-01 12:30:00.202 C 
3 1900-01-01 12:30:00.300 D 

如果需要列Time沒有改變ANS分裂的第一個值更高,因爲100ms

df['Time1']= pd.to_datetime(df['Time'], format='%H:%M:%S.%f') 
print (df) 
      Time X     Time1 
0 12:30:00.00 A 1900-01-01 12:30:00.000 
1 12:30:00.100 B 1900-01-01 12:30:00.100 
2 12:30:00.202 C 1900-01-01 12:30:00.202 
3 12:30:00.300 D 1900-01-01 12:30:00.300 
1 12:30:00.100 E 1900-01-01 12:30:00.100 
2 12:30:00.202 F 1900-01-01 12:30:00.202 

print (df.Time1.diff()) 
0      NaT 
1   00:00:00.100000 
2   00:00:00.102000 
3   00:00:00.098000 
1 -1 days +23:59:59.800000 
2   00:00:00.102000 
Name: Time1, dtype: timedelta64[ns] 

mask = (((df.Time1.diff() > pd.to_timedelta('00:00:00.100000')).cumsum()) >= 1) 
print (mask) 
0 False 
1 False 
2  True 
3  True 
1  True 
2  True 
Name: Time1, dtype: bool 

print (df[mask].drop('Time1',axis=1)) 
      Time X 
2 12:30:00.202 C 
3 12:30:00.300 D 
1 12:30:00.100 E 
2 12:30:00.202 F 

如果需要通過最後一個值拆分:

print (df) 
      Time X 
0 12:30:00.00 A 
1 12:30:00.100 B 
2 12:30:00.202 C 
3 12:30:00.300 D 
1 12:30:00.100 E 
2 12:30:00.202 F 

#create helper series 
time_ser= pd.to_datetime(df['Time'], format='%H:%M:%S.%f') 
#get differences 
print (time_ser.diff()) 
0      NaT 
1   00:00:00.100000 
2   00:00:00.102000 
3   00:00:00.098000 
1 -1 days +23:59:59.800000 
2   00:00:00.102000 
Name: Time, dtype: timedelta64[ns] 
#compare with 100ms timedalta 
mask = (((time_ser.diff() > pd.to_timedelta('00:00:00.100000')).cumsum())) 
print (mask) 
0 0 
1 0 
2 1 
3 1 
1 1 
2 2 
Name: Time, dtype: int32 

#get last value of mask 
last_val = mask.iat[-1] 
print(last_val) 
2 

#compare mask with last value and use boolean indexing 
print (df[mask == last_val]) 
      Time X 
2 12:30:00.202 F 
+0

用最後的值分割編輯回答,請確認解決方法。謝謝。 – jezrael