2016-05-19 86 views
10

正如標題所示,Pandas的ols命令中的滾動功能選項在statsmodels中遷移到哪裏?我似乎無法找到它。 熊貓告訴我的厄運是在作品:從Pandas到Statsmodels的OLS中的棄用滾動窗口選項

FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html 
    model = pd.ols(y=series_1, x=mmmm, window=50) 
其實

,如果你這樣做:

import statsmodels.api as sm 

model = sm.OLS(series_1, mmmm, window=50).fit() 

print(model.summary()) 

你得到的結果(窗口不會損害代碼的運行),但你只有迴歸的參數在整個週期運行,而不是每個應該應該工作的滾動週期的一系列參數。

回答

2

開始的行中表示:我創建了一個ols模塊,用於模擬熊貓過時的MovingOLS;它是here

它有三個核心類:

  • OLS:靜態的(單窗口)普通最小二乘迴歸。輸出是NumPy數組
  • RollingOLS:滾動(多窗口)普通最小二乘迴歸。輸出是更高維的NumPy數組。
  • PandasRollingOLS:將RollingOLS的結果包裝在熊貓系列中& DataFrames。旨在模仿已棄用的熊貓模塊的外觀。

請注意,該模塊是package(我目前正在上傳到PyPi)的一部分,它需要一個軟件包間導入。

上面的前兩個類完全在NumPy中實現,主要使用矩陣代數。 RollingOLS也充分利用廣播。屬性很大程度上模仿了statsmodels的OLS RegressionResultsWrapper

一個例子:

# Pull some data from fred.stlouisfed.org 
from pandas_datareader.data import DataReader 

syms = {'TWEXBMTH' : 'usd', 
     'T10Y2YM' : 'term_spread', 
     'PCOPPUSDM' : 'copper' 
     } 
data = (DataReader(syms.keys(), 'fred', start='2000-01-01') 
     .pct_change() 
     .dropna()) 
data = data.rename(columns=syms) 
print(data.head()) 
       # usd term_spread copper 
# DATE          
# 2000-02-01 0.01260  -1.40909 -0.01997 
# 2000-03-01 -0.00012  2.00000 -0.03720 
# 2000-04-01 0.00564  0.51852 -0.03328 
# 2000-05-01 0.02204  -0.09756 0.06135 
# 2000-06-01 -0.01012  0.02703 -0.01850 

# Rolling regressions 

from pyfinance.ols import PandasRollingOLS 

y = data.usd 
x = data.drop('usd', axis=1) 

window = 12 # months 
model = PandasRollingOLS(y=y, x=x, window=window) 

print(model.beta.head()) # Coefficients excluding the intercept 
      # term_spread copper 
# DATE        
# 2001-01-01  0.00010 0.05568 
# 2001-02-01  0.00047 0.06271 
# 2001-03-01  0.00147 0.03576 
# 2001-04-01  0.00161 0.02956 
# 2001-05-01  0.00158 -0.04497 

print(model.fstat.head()) 
# DATE 
# 2001-01-01 0.28121 
# 2001-02-01 0.42602 
# 2001-03-01 0.38802 
# 2001-04-01 0.39230 
# 2001-05-01 0.41706 
# Freq: MS, Name: fstat, dtype: float64 

print(model.rsq.head()) # R-squared 
# DATE 
# 2001-01-01 0.05882 
# 2001-02-01 0.08648 
# 2001-03-01 0.07938 
# 2001-04-01 0.08019 
# 2001-05-01 0.08482 
# Freq: MS, Name: rsq, dtype: float64 
+0

GitHub的鏈接不再起作用。 –

+1

@CharlesPlager感謝您提醒我注意,鏈接已更新。 –

1

滾動軸承的β與sklearn

import pandas as pd 
from sklearn import linear_model 

def rolling_beta(X, y, idx, window=255): 

    assert len(X)==len(y) 

    out_dates = [] 
    out_beta = [] 

    model_ols = linear_model.LinearRegression() 

    for iStart in range(0, len(X)-window):   
     iEnd = iStart+window 

     model_ols.fit(X[iStart:iEnd], y[iStart:iEnd]) 

     #store output 
     out_dates.append(idx[iEnd]) 
     out_beta.append(model_ols.coef_[0][0]) 

    return pd.DataFrame({'beta':out_beta}, index=out_dates) 


df_beta = rolling_beta(df_rtn_stocks['NDX'].values.reshape(-1, 1), df_rtn_stocks['CRM'].values.reshape(-1, 1), df_rtn_stocks.index.values, 255) 
0

添加的完整性,這限制計算僅向迴歸係數和最終估計

numpy的軋製迴歸函數

import numpy as np 

def rolling_regression(y, x, window=60): 
    """ 
    y and x must be pandas.Series 
    """ 
# === Clean-up ============================================================ 
    x = x.dropna() 
    y = y.dropna() 
# === Trim acc to shortest ================================================ 
    if x.index.size > y.index.size: 
     x = x[y.index] 
    else: 
     y = y[x.index] 
# === Verify enough space ================================================= 
    if x.index.size < window: 
     return None 
    else: 
    # === Add a constant if needed ======================================== 
     X = x.to_frame() 
     X['c'] = 1 
    # === Loop... this can be improved ==================================== 
     estimate_data = [] 
     for i in range(window, x.index.size+1): 
      X_slice = X.values[i-window:i,:] # always index in np as opposed to pandas, much faster 
      y_slice = y.values[i-window:i] 
      coeff = np.dot(np.dot(np.linalg.inv(np.dot(X_slice.T, X_slice)), X_slice.T), y_slice) 
      estimate_data.append(coeff[0] * x.values[window-1] + coeff[1]) 
    # === Assemble ======================================================== 
     estimate = pandas.Series(data=estimate_data, index=x.index[window-1:]) 
     return estimate    
更迅速 numpy -only溶液

在一些具體的情況下使用,只要求迴歸的最終估計,x.rolling(window=60).apply(my_ols)似乎有點慢

作爲提醒,用於迴歸係數可以被計算爲一個矩陣產品,你可以在wikipedia's least squares page上閱讀。這種方法通過numpy的矩陣乘法可以加快這個過程,而不是使用statsmodels中的ols。本產品在起始行coeff = ...