1
我有一些像這樣的財務數據 時間戳OHCL。現在我想將我的熊貓數據框聚合到1分鐘的條中。熊貓有這樣的優雅方式嗎?熊貓數據框通過時間戳聚合到x分鐘箱
我有一些像這樣的財務數據 時間戳OHCL。現在我想將我的熊貓數據框聚合到1分鐘的條中。熊貓有這樣的優雅方式嗎?熊貓數據框通過時間戳聚合到x分鐘箱
您可能需要做一些處理,但pd.cut()可以做到這一點。
>>> seconds = [10.5,12.5,22.5,33.5,15.02, 19.26, 35.26]
>>> bins = [10,11,12,13,14,15,20,25,30,40]
>>> cats = pd.cut(seconds, bins)
>>> cats
[(10, 11], (12, 13], (20, 25], (30, 40], (15, 20], (15, 20], (30, 40]]
一旦你有了這個,你可以通過這個列聚合,但適合你的分析。
正如@JohnE提到的,resample
是您需要的工具。您可以通過how='ohlc'
到resample
以獲得所需的輸出。
import pandas as pd
import numpy as np
# generate some artificial data
# ===========================================
np.random.seed(0)
dt_rng = pd.date_range(start='2015-09-02 09:30:00', end='2015-09-02 15:59:59', freq='s')
df = pd.DataFrame(100+np.random.randn(len(dt_rng)).cumsum(), columns=['px'], index=dt_rng)
print(df)
px
2015-09-02 09:30:00 101.7641
2015-09-02 09:30:01 102.1642
2015-09-02 09:30:02 103.1429
2015-09-02 09:30:03 105.3838
2015-09-02 09:30:04 107.2514
2015-09-02 09:30:05 106.2741
2015-09-02 09:30:06 107.2242
2015-09-02 09:30:07 107.0729
... ...
2015-09-02 15:59:52 79.0222
2015-09-02 15:59:53 81.2040
2015-09-02 15:59:54 81.6277
2015-09-02 15:59:55 82.3117
2015-09-02 15:59:56 83.0102
2015-09-02 15:59:57 82.7588
2015-09-02 15:59:58 81.0294
2015-09-02 15:59:59 81.3962
[23400 rows x 1 columns]
# processing
# =======================
df.resample('1min', how='ohlc')
px
open high low close
2015-09-02 09:30:00 101.7641 113.8188 101.7641 104.6000
2015-09-02 09:31:00 103.9276 115.9134 96.2217 115.9134
2015-09-02 09:32:00 116.2898 120.5850 115.1904 116.7901
2015-09-02 09:33:00 116.4361 116.5853 108.7353 111.4434
2015-09-02 09:34:00 110.8060 110.8060 99.6007 108.2589
2015-09-02 09:35:00 106.9523 108.6105 92.8644 93.4848
2015-09-02 09:36:00 94.1833 95.6041 84.2610 91.4362
2015-09-02 09:37:00 92.3657 92.9479 80.2402 85.0347
... ... ... ... ...
2015-09-02 15:52:00 64.6560 69.4697 56.4659 69.1167
2015-09-02 15:53:00 69.3775 73.6731 64.6894 73.6731
2015-09-02 15:54:00 74.6119 81.2891 67.9659 78.4973
2015-09-02 15:55:00 78.9224 81.8589 72.9847 77.1010
2015-09-02 15:56:00 77.7440 91.1469 77.7440 88.8073
2015-09-02 15:57:00 88.9114 90.8509 83.8462 87.7416
2015-09-02 15:58:00 88.2430 89.0107 80.5122 87.0581
2015-09-02 15:59:00 87.1443 87.1443 77.6822 81.3962
[390 rows x 4 columns]
請問您可以添加目前爲止的代碼和一些示例數據給您嗎? – albert
您可以將它舍入到最近的(x?)分鐘,然後執行聚合。也許你可以找到一些靈感[這裏](http://stackoverflow.com/questions/24479577/pandas-timestamp-index-rounding-to-the-nearest-5th-minute) – chappers
'resample'是一種標準的方式 – JohnE