Python的大熊貓：

假設我已經加載從SQL或CSV（在python沒有創建）一個時間序列數據的時間序列的檢測頻率，該指數將是：Python的大熊貓：

DatetimeIndex(['2015-03-02 00:00:00', '2015-03-02 01:00:00', 
       '2015-03-02 02:00:00', '2015-03-02 03:00:00', 
       '2015-03-02 04:00:00', '2015-03-02 05:00:00', 
       '2015-03-02 06:00:00', '2015-03-02 07:00:00', 
       '2015-03-02 08:00:00', '2015-03-02 09:00:00', 
       ... 
       '2015-07-19 14:00:00', '2015-07-19 15:00:00', 
       '2015-07-19 16:00:00', '2015-07-19 17:00:00', 
       '2015-07-19 18:00:00', '2015-07-19 19:00:00', 
       '2015-07-19 20:00:00', '2015-07-19 21:00:00', 
       '2015-07-19 22:00:00', '2015-07-19 23:00:00'], 
       dtype='datetime64[ns]', name=u'hour', length=3360, freq=None, tz=None)

正如你所看到的，「頻率'是無。我想知道如何檢測這個系列的頻率，並將頻率設置爲頻率。

如果可能的話，我希望這可以在數據不連續的情況下工作（系列中有很多中斷）。

我試圖找到全2個時間戳之間的差異的模式，但我不知道如何將它轉移到一個格式，可讀

系列

來源

2015-07-20 Jim

如果有差距，是由頻率差最小的兩個時間戳設置？ – mdurant

@mdurant是的，大部分兩個時間戳的差異都是最小的差異 – Jim

也許嘗試服用timeindex的差異和使用的模式（或最小差異）作爲頻率。

import pandas as pd 
import numpy as np 

# simulate some data 
# =================================== 
np.random.seed(0) 
dt_rng = pd.date_range('2015-03-02 00:00:00', '2015-07-19 23:00:00', freq='H') 
dt_idx = pd.DatetimeIndex(np.random.choice(dt_rng, size=2000, replace=False)) 
df = pd.DataFrame(np.random.randn(2000), index=dt_idx, columns=['col']).sort_index() 
df 

         col 
2015-03-02 01:00:00 2.0261 
2015-03-02 04:00:00 1.3325 
2015-03-02 05:00:00 -0.9867 
2015-03-02 06:00:00 -0.0671 
2015-03-02 08:00:00 -1.1131 
2015-03-02 09:00:00 0.0494 
2015-03-02 10:00:00 -0.8130 
2015-03-02 11:00:00 1.8453 
...      ... 
2015-07-19 13:00:00 -0.4228 
2015-07-19 14:00:00 1.1962 
2015-07-19 15:00:00 1.1430 
2015-07-19 16:00:00 -1.0080 
2015-07-19 18:00:00 0.4009 
2015-07-19 19:00:00 -1.8434 
2015-07-19 20:00:00 0.5049 
2015-07-19 23:00:00 -0.5349 

[2000 rows x 1 columns] 

# processing 
# ================================== 
# the gap distribution 
res = (pd.Series(df.index[1:]) - pd.Series(df.index[:-1])).value_counts() 

01:00:00 1181 
02:00:00  499 
03:00:00  180 
04:00:00  93 
05:00:00  24 
06:00:00  10 
07:00:00  9 
08:00:00  3 
dtype: int64 

# the mode can be considered as frequency 
res.index[0] # output: Timedelta('0 days 01:00:00') 
# or maybe the smallest difference 
res.index.min() # output: Timedelta('0 days 01:00:00') 




# get full datetime rng 
full_rng = pd.date_range(df.index[0], df.index[-1], freq=res.index[0]) 
full_rng 

DatetimeIndex(['2015-03-02 01:00:00', '2015-03-02 02:00:00', 
       '2015-03-02 03:00:00', '2015-03-02 04:00:00', 
       '2015-03-02 05:00:00', '2015-03-02 06:00:00', 
       '2015-03-02 07:00:00', '2015-03-02 08:00:00', 
       '2015-03-02 09:00:00', '2015-03-02 10:00:00', 
       ... 
       '2015-07-19 14:00:00', '2015-07-19 15:00:00', 
       '2015-07-19 16:00:00', '2015-07-19 17:00:00', 
       '2015-07-19 18:00:00', '2015-07-19 19:00:00', 
       '2015-07-19 20:00:00', '2015-07-19 21:00:00', 
       '2015-07-19 22:00:00', '2015-07-19 23:00:00'], 
       dtype='datetime64[ns]', length=3359, freq='H', tz=None)

來源

2015-07-20 13:40:24

的最小時間差被發現與

np.diff(data.index.values).min()

通常是以ns爲單位。爲了得到一個頻率，假設NS：

freq = 1e9/np.diff(df.index.values).min().astype(int)

來源

2015-07-20 14:39:00 mdurant

值得一提的是，如果數據是連續的，你可以使用pandas.DateTimeIndex.inferred_freq屬性：

dt_ix = pd.date_range('2015-03-02 00:00:00', '2015-07-19 23:00:00', freq='H') 
dt_ix._set_freq(None) 
dt_ix.inferred_freq 
Out[2]: 'H'

或pandas.infer_freq方法：

pd.infer_freq(dt_ix) 
Out[3]: 'H'

如果不連續pandas.infer_freq將返回None。同樣於已經提出的是，另一種方法是使用pandas.Series.diff方法：

split_ix = dt_ix.drop(pd.date_range('2015-05-01 00:00:00','2015-05-30 00:00:00', freq='1H')) 
split_ix.to_series().diff().min() 
Out[4]: Timedelta('0 days 01:00:00')

來源

2017-05-14 12:31:06 Delforge

Python的大熊貓：

回答

相關問題