當我有很多數據點時,我可以使用statsmodel的WLS(weighted least squares regression)。然而,當我嘗試使用WLS處理數據集中的單個樣本時,我似乎遇到了numpy數組的問題。Python錯誤:使用一行數據使用statsmodels時未處理對象的len()
我的意思是,如果我有一個數據集X是一個二維數組,有很多行,WLS工作正常。但是,如果我嘗試在單行上工作,則不是。你會明白我的意思在下面的代碼:以上
import sys
from sklearn.externals.six.moves import xrange
from sklearn.metrics import accuracy_score
import pylab as pl
from sklearn.externals.six.moves import zip
import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
# this is my dataset X, with 10 rows
X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]])
# this is my response vector, y, also with 10 rows
y = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1])
# weights, 10 rows
weights = np.array([ 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1 ])
# the line below, using all 10 rows of X, gives no errors but is commented out
# mod_wls = sm.WLS(y, X, weights)
# and this is the line I need, which is giving errors:
mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))
最後一行最初只是mod_wls = sm.WLS(y[0], X[0], weights[0])
但是,這給了我這樣的錯誤object of type 'numpy.float64' has no len()
,所以我把他們變成陣列。 但現在我不斷收到此錯誤:
Traceback (most recent call last):
File "C:\Users\app\Documents\Python Scripts\test.py", line 53, in <module>
mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 383, in __init__
weights=weights, hasconst=hasconst)
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 79, in __init__
super(RegressionModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 136, in __init__
super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 52, in __init__
self.data = handle_data(endog, exog, missing, hasconst, **kwargs)
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 401, in handle_data
return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs)
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 78, in __init__
self._check_integrity()
File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 249, in _check_integrity
print len(self.endog)
TypeError: len() of unsized object
所以爲了看看有什麼是錯的長度,我這樣做:
print "y size: "
print len(np.array([y[0]]))
print "X size"
print len (np.array([X[0]]))
print "weights size"
print len(np.array([weights[0]]))
,並得到這樣的輸出:
y size:
1
X size
1
weights size
1
然後我試了這個:
print "x shape"
print X[0].shape
print "y shape"
print y[0].shape
輸出功率爲:
x shape
(3L,)
y shape
()
249線在data.py,其中錯誤引用,有這個功能,在那裏我才能添加了一堆「打印尺寸」,看看發生了什麼事:
def _check_integrity(self):
if self.exog is not None:
print "exog size: "
print len(self.exog)
print "endog size"
print len(self.endog) # <-- this, and the line below are causing the error
if len(self.exog) != len(self.endog):
raise ValueError("endog and exog matrices are different sizes")
看起來len(self.endog)
有問題。雖然當我嘗試打印出len(np.array([y[0]]))
時,它只是輸出1
。但不知何故,當y
進入check_integrity函數,併成爲endog
,它表現不一樣......或是其他事情正在進行?
我該怎麼辦?我正在使用一種算法,我真的需要分別爲每行X
運行WLS。
爲什麼要這麼做?單一觀察迴歸是我們(statsmodels開發者)從未考慮過的。在你的例子中,你試圖用1次觀測來估計3個參數。 – user333700
我正試圖在這裏執行步驟(2)(ii):http://stats.stackexchange.com/questions/93691/what-does-it-mean-to-fit-a-regression-function-and-然後使用 - 這對更新等。如果$ X $是完整的觀察集合,並且$ x_i $是一個觀察結果,那麼是單個觀察的估計而不是這個意思? – user961627
步驟(2)(ii)我很確定這意味着使用所有的i,即使用權重向量(w_i)在$(x_i)$矩陣上回歸g向量。聽起來類似於迭代重新加權的最小二乘,這是一種常見的優化方法。我沒有試着去了解細節。 – user333700