Python錯誤：使用一行數據使用statsmodels時未處理對象的len（）

當我有很多數據點時，我可以使用statsmodel的WLS（weighted least squares regression）。然而，當我嘗試使用WLS處理數據集中的單個樣本時，我似乎遇到了numpy數組的問題。Python錯誤：使用一行數據使用statsmodels時未處理對象的len（）

我的意思是，如果我有一個數據集X是一個二維數組，有很多行，WLS工作正常。但是，如果我嘗試在單行上工作，則不是。你會明白我的意思在下面的代碼：以上

import sys 
from sklearn.externals.six.moves import xrange 
from sklearn.metrics import accuracy_score 
import pylab as pl 
from sklearn.externals.six.moves import zip 
import numpy as np 
import statsmodels.api as sm 
from statsmodels.sandbox.regression.predstd import wls_prediction_std 

# this is my dataset X, with 10 rows 
X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) 
# this is my response vector, y, also with 10 rows 
y = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1]) 
# weights, 10 rows 
weights = np.array([ 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1 ]) 

# the line below, using all 10 rows of X, gives no errors but is commented out 
# mod_wls = sm.WLS(y, X, weights) 
# and this is the line I need, which is giving errors: 
mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))

最後一行最初只是mod_wls = sm.WLS(y[0], X[0], weights[0])

但是，這給了我這樣的錯誤object of type 'numpy.float64' has no len()，所以我把他們變成陣列。但現在我不斷收到此錯誤：

Traceback (most recent call last): 
    File "C:\Users\app\Documents\Python Scripts\test.py", line 53, in <module> 
    mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]])) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 383, in __init__ 
    weights=weights, hasconst=hasconst) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 79, in __init__ 
    super(RegressionModel, self).__init__(endog, exog, **kwargs) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 136, in __init__ 
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 52, in __init__ 
    self.data = handle_data(endog, exog, missing, hasconst, **kwargs) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 401, in handle_data 
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs) 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 78, in __init__ 
    self._check_integrity() 
    File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 249, in _check_integrity 
    print len(self.endog) 
TypeError: len() of unsized object

所以爲了看看有什麼是錯的長度，我這樣做：

print "y size: " 
print len(np.array([y[0]])) 
print "X size" 
print len (np.array([X[0]])) 
print "weights size" 
print len(np.array([weights[0]]))

，並得到這樣的輸出：

y size: 
1 
X size 
1 
weights size 
1

然後我試了這個：

print "x shape" 
print X[0].shape 
print "y shape" 
print y[0].shape

輸出功率爲：

x shape 
(3L,) 
y shape 
()

249線在data.py，其中錯誤引用，有這個功能，在那裏我才能添加了一堆「打印尺寸」，看看發生了什麼事：

def _check_integrity(self): 
    if self.exog is not None: 
     print "exog size: " 
     print len(self.exog)    
     print "endog size" 
     print len(self.endog) # <-- this, and the line below are causing the error 
     if len(self.exog) != len(self.endog): 
      raise ValueError("endog and exog matrices are different sizes")

看起來len(self.endog)有問題。雖然當我嘗試打印出len(np.array([y[0]]))時，它只是輸出1。但不知何故，當y進入check_integrity函數，併成爲endog，它表現不一樣......或是其他事情正在進行？

我該怎麼辦？我正在使用一種算法，我真的需要分別爲每行X運行WLS。

來源

2014-04-28 user961627

爲什麼要這麼做？單一觀察迴歸是我們（statsmodels開發者）從未考慮過的。在你的例子中，你試圖用1次觀測來估計3個參數。 – user333700

我正試圖在這裏執行步驟（2）（ii）：http://stats.stackexchange.com/questions/93691/what-does-it-mean-to-fit-a-regression-function-and-然後使用 - 這對更新等。如果$ X $是完整的觀察集合，並且$ x_i $是一個觀察結果，那麼是單個觀察的估計而不是這個意思？ – user961627

步驟（2）（ii）我很確定這意味着使用所有的i，即使用權重向量（w_i）在$（x_i）$矩陣上回歸g向量。聽起來類似於迭代重新加權的最小二乘，這是一種常見的優化方法。我沒有試着去了解細節。 – user333700

對於一次觀察，沒有WLS這樣的事情。當它們歸一化爲1時，單個重量將變爲1.如果你想這樣做，雖然我沒有，但是隻使用OLS。這個解決方案將是SVD的一個後果，而不是數據中的任何實際關係。

OLS解決方案使用PINV/SVD

np.dot(np.linalg.pinv(X[[0]]), y[0])

雖然你可以只是彌補這適用於任何答案，並得到同樣的結果。我不確定SVD解決方案的特性與其他非獨特解決方案的差異。

[~/] 
[26]: beta = [-.5, .25, 1/3.] 

[~/] 
[27]: np.dot(beta, X[0]) 
[27]: 1.0

來源

2014-04-28 16:58:45 jseabold

Python錯誤：使用一行數據使用statsmodels時未處理對象的len（）

回答

相關問題