如何近似熊貓時間的週期系列

有沒有方法可以近似熊貓時間序列的週期？對於R，xts對象具有一個名爲periodicity的方法，該方法完全適用於此目的。有沒有一個實施的方法來做到這一點？如何近似熊貓時間的週期系列

例如，我們可以從時間序列中推斷出沒有指定頻率的頻率嗎？

import pandas.io.data as web 
aapl = web.get_data_yahoo("AAPL") 

<class 'pandas.tseries.index.DatetimeIndex'> 
[2010-01-04 00:00:00, ..., 2013-12-19 00:00:00] 
Length: 999, Freq: None, Timezone: None

該系列的頻率可合理地近似爲每日。

更新：

我想可能是有幫助的，顯示的r實施週期方法的源代碼。

function (x, ...) 
{ 
    if (timeBased(x) || !is.xts(x)) 
     x <- try.xts(x, error = "'x' needs to be timeBased or xtsible") 
    p <- median(diff(.index(x))) 
    if (is.na(p)) 
     stop("can not calculate periodicity of 1 observation") 
    units <- "days" 
    scale <- "yearly" 
    label <- "year" 
    if (p < 60) { 
     units <- "secs" 
     scale <- "seconds" 
     label <- "second" 
    } 
    else if (p < 3600) { 
     units <- "mins" 
     scale <- "minute" 
     label <- "minute" 
     p <- p/60L 
    } 
    else if (p < 86400) { 
     units <- "hours" 
     scale <- "hourly" 
     label <- "hour" 
    } 
    else if (p == 86400) { 
     scale <- "daily" 
     label <- "day" 
    } 
    else if (p <= 604800) { 
     scale <- "weekly" 
     label <- "week" 
    } 
    else if (p <= 2678400) { 
     scale <- "monthly" 
     label <- "month" 
    } 
    else if (p <= 7948800) { 
     scale <- "quarterly" 
     label <- "quarter" 
    } 
    structure(list(difftime = structure(p, units = units, class = "difftime"), 
     frequency = p, start = start(x), end = end(x), units = units, 
     scale = scale, label = label), class = "periodicity") 
}

我覺得這條線是關鍵，這一點我不太明白 p <- median(diff(.index(x)))

來源

2013-12-20 zsljulius

傅里葉變換可能會有幫助嗎？ – Paul

這次系列跳過週末（和假期），所以它確實沒有每天的頻率開始。你可以使用asfreq將其升頻至每天頻率的時間序列，但是：

aapl = aapl.asfreq('D', method='ffill')

這樣做向前傳播的最後一個觀察值與缺失值的日期。

注意，大熊貓也有一個工作日的頻率，所以它也可以使用上採樣到工作日：

aapl = aapl.asfreq('B', method='ffill')

如果你想自動推斷中位數頻率的過程天，那麼你可以這樣做：

import pandas as pd 
import numpy as np 
import pandas.io.data as web 
aapl = web.get_data_yahoo("AAPL") 
f = np.median(np.diff(aapl.index.values)) 
days = f.astype('timedelta64[D]').item().days 
aapl = aapl.asfreq('{}D'.format(days), method='ffill') 
print(aapl)

此代碼需要測試，但也許說到接近您發佈的R代碼：

import pandas as pd 
import numpy as np 
import pandas.io.data as web 

def infer_freq(ts): 
    med = np.median(np.diff(ts.index.values)) 
    seconds = int(med.astype('timedelta64[s]').item().total_seconds()) 
    if seconds < 60: 
     freq = '{}s'.format(seconds) 
    elif seconds < 3600: 
     freq = '{}T'.format(seconds//60) 
    elif seconds < 86400: 
     freq = '{}H'.format(seconds//3600) 
    elif seconds < 604800: 
     freq = '{}D'.format(seconds//86400) 
    elif seconds < 2678400: 
     freq = '{}W'.format(seconds//604800) 
    elif seconds < 7948800: 
     freq = '{}M'.format(seconds//2678400) 
    else: 
     freq = '{}Q'.format(seconds//7948800) 
    return ts.asfreq(freq, method='ffill') 

aapl = web.get_data_yahoo("AAPL") 
print(infer_freq(aapl))

來源

2013-12-20 21:08:28 unutbu

我不知道頻率，唯一有意義的措施，我可以想出是平均timedelta，爲例如在天：

>>> import numpy as np 
>>> idx = aapl.index.values 
>>> (np.roll(idx, -1) - idx)[:-1].mean()/np.timedelta64(1, 'D') 
1.4478957915831596

或以小時：

>>> (np.roll(idx, -1) - idx)[:-1].mean()/np.timedelta64(1, 'h') 
34.749498997995836

用更pandorable表達的一樣，榮譽給@DSM：

>>> aapl.index.to_series().diff().mean()/(60*60*10**9) 
34.749498997995993

當然位數將是24小時，因爲大多數的日子中存在列表：

>>> aapl.index.to_series().diff().median()/(60*60*10**9) 
24.0

來源

2013-12-20 21:08:14 alko

我認爲編寫'aapl.index.to_series（）。diff（）。mean（）'或'.median（）'更加可行。 – DSM

@DSM謝謝，這個'to_series'是我錯過了能夠使用'.diff（）'的。 – alko

如何近似熊貓時間的週期系列

回答

相關問題