我在尋找基於在開始時間和ENDCOLUMN值有一個「擴」的日期範圍。熊貓累計時間序列範圍數據幀
如果在之前的紀錄出現創紀錄的任何部分,我想回到一個開始時間是兩個開始時間記錄的最小和結束時間是最大的兩個結束時間記錄。
這些將通過訂單ID
Order starttime endtime RollingStart RollingEnd
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485
所以進行分組,在上述例子中,訂單1具有運行從2015年7月1日10的初始範圍:24:43.047到2015-07- 01 10:24:57.257然後另一個從2015-07-01 10:24:57.465到2015-07-01 10:25:13.485
請注意,雖然開始時間有序,但結束時間不一定由於數據的性質(有短期的事件和長期事件)
最後,我只想每個訂單ID的最後一個記錄,滾動發車組合(所以在這種情況下,最後兩個記錄
我試圖
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']), min(df['starttime'],df['RollingStart']),df['starttime'])
(這顯然不包括訂單id)
但錯誤我收到是
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
任何想法,將不勝感激
代碼複製如下:
from io import StringIO
import io
text = """Order starttime endtime
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485"""
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['start']), min(df['starttime'],df['RollingStart']),df['starttime'])
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart']=df['starttime']
df['RollingEnd']=df['endtime']
df['RollingStart'] =
np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']),min(df['starttime'],df['RollingStart']),df['starttime'])
錯誤是:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
感謝
我試圖讓最早開始時間(然後我會嘗試獲得最新的結束時間),每個重疊系列..不知道我跟你提出什麼,我的道歉 –
完整的代碼重現內聯更新 - 謝謝 –