熊貓滾動應用自定義

我一直在遵循類似的回答here，但在使用sklearn和滾動應用時我有一些問題。我想創建z分數，並與滾動做PCA申請，但我一直在得到'only length-1 arrays can be converted to Python scalars' error.熊貓滾動應用自定義

按照前面的例子中，我創建一個數據幀

from sklearn.preprocessing import StandardScaler 
import pandas as pd 
import numpy as np 
sc=StandardScaler() 
tmp=pd.DataFrame(np.random.randn(2000,2)/10000,index=pd.date_range('2001-01-01',periods=2000),columns=['A','B'])

如果我使用rolling命令：

tmp.rolling(window=5,center=False).apply(lambda x: sc.fit_transform(x)) 
TypeError: only length-1 arrays can be converted to Python scalars

我得到這個錯誤。然而，我可以用平均值和標準偏差創建功能，沒有問題。

def test(df): 
    return np.mean(df) 
tmp.rolling(window=5,center=False).apply(lambda x: test(x))

我相信這個錯誤發生在我試圖用z-score的當前值減去平均值時。

def test2(df): 
    return df-np.mean(df) 
tmp.rolling(window=5,center=False).apply(lambda x: test2(x)) 
only length-1 arrays can be converted to Python scalars

如何使用sklearn創建自定義滾動函數來首先標準化並運行PCA？

編輯：我意識到我的問題並不完全清楚，所以我會再試一次。我想標準化我的數值，然後運行PCA以獲得由每個因素解釋的變化量。做這個沒有滾動是相當直接的。

testing=sc.fit_transform(tmp) 
pca=decomposition.pca.PCA() #run pca 
pca.fit(testing) 
pca.explained_variance_ratio_ 
array([ 0.50967441, 0.49032559])

我在滾動時不能使用這個相同的過程。使用@piRSquared的滾動zscore函數可以提供zscores。似乎sklearn的PCA與滾動應用自定義功能不兼容。（事實上，我認爲這是大多數sklearn模塊的情況。）我只是試圖獲得解釋的差異，這是一維項目，但下面的代碼返回一堆NaN。

def test3(df): 
    pca.fit(df) 
    return pca.explained_variance_ratio_ 
tmp.rolling(window=5,center=False).apply(lambda x: test3(x))

但是，我可以創建自己解釋的方差函數，但這也行不通。

def test4(df): 
    cov_mat=np.cov(df.T) #need covariance of features, not observations 
    eigen_vals,eigen_vecs=np.linalg.eig(cov_mat) 
    tot=sum(eigen_vals) 
    var_exp=[(i/tot) for i in sorted(eigen_vals,reverse=True)] 
    return var_exp 
tmp.rolling(window=5,center=False).apply(lambda x: test4(x))

我得到這個錯誤0-dimensional array given. Array must be at least two-dimensional。回顧一下，我想運行滾動的Z分數，然後在每次滾動時滾動輸出解釋的方差。我有滾動的Z分數下降，但沒有解釋方差。

來源

2016-12-04 Bobe Kryant

你期望的輸出是什麼？熊貓滾動函數應該從大量輸入中產生單個標量值。如果你想在塊上做更復雜的操作，你將不得不「自己滾動」。 – BrenBarn

正如@BrenBarn所評論的，滾動功能需要將向量減少爲單個數字。以下內容與您正在嘗試做的事情相同，並有助於突出顯示問題。

zscore = lambda x: (x - x.mean())/x.std() 
tmp.rolling(5).apply(zscore)

TypeError: only length-1 arrays can be converted to Python scalars

在zscore功能，x.mean()降低，x.std()減少，但x是一個數組。因此，整個事情是一個數組。

解決這個問題的方法是執行上需要它的Z分數計算的部分軋輥，而不是導致問題的部分。

(tmp - tmp.rolling(5).mean())/tmp.rolling(5).std()

來源

2016-12-04 06:31:59 piRSquared

感謝z-score部分。我試圖爲PCA部分做一些類似的工作無濟於事。 lambda是否搞亂了PCA，因爲我正在爲多行而不只是一行？ –

熊貓滾動應用自定義

回答

相關問題