蟒蛇 - 線性迴歸 - 圖像

我想包裹在機器上學習python的頭。我一直在使用以下示例（http://scikit-learn.org/stable/auto_examples/plot_multioutput_face_completion.html#example-plot-multioutput-face-completion-py）以及下面的代碼示例。蟒蛇 - 線性迴歸 - 圖像

我很想測試/驗證我對線性迴歸的內部工作的理解。目的是通過查看圖片的已知上半部分來預測圖片的下半部分。最初有300個64×64圖像（4096個像素）。自變量X是一個300 * 2048的矩陣（300張圖片，2048個像素（這些圖片的上半部分），因變量也是一個300 * 2048的矩陣（圖片的下半部分）似乎係數矩陣是一個2048 * 2048矩陣是嗎在我的理解是：

，對於y的單個像素（例如圖1，最uppper左像素）的預測是通過在所有2048個像素的multiplicatoin執行上半部分圖片是迴歸係數集合的1倍 - 因此，下半部分中每個丟失的像素是通過考慮該特定圖像的所有2048個像素來估計的？
that regression co效率與像素有關（每個y像素具有不同的2048迴歸係數組），並且這些係數是通過在可用的300幅圖像中的相同像素位置上找到適合該特定像素位置的OLS來估計的？

我可能很容易被矩陣混淆 - 所以請糾正我，如果我錯了。非常感謝。 W

print(__doc__) 

import numpy as np 
import matplotlib.pyplot as plt 

from sklearn.datasets import fetch_olivetti_faces 
from sklearn.utils.validation import check_random_state 

from sklearn.ensemble import ExtraTreesRegressor 
from sklearn.neighbors import KNeighborsRegressor 
from sklearn.linear_model import LinearRegression 
from sklearn.linear_model import RidgeCV 

# Load the faces datasets 
data = fetch_olivetti_faces() 
targets = data.target 

data = data.images.reshape((len(data.images), -1)) 
train = data[targets < 30] 
test = data[targets >= 30] # Test on independent people 

# Test on a subset of people 
n_faces = 5 
rng = check_random_state(4) 
face_ids = rng.randint(test.shape[0], size=(n_faces,)) 
test = test[face_ids, :] 

n_pixels = data.shape[1] 
X_train = train[:, :np.ceil(0.5 * n_pixels)] # Upper half of the faces 
y_train = train[:, np.floor(0.5 * n_pixels):] # Lower half of the faces 
X_test = test[:, :np.ceil(0.5 * n_pixels)] 
y_test = test[:, np.floor(0.5 * n_pixels):] 

# Fit estimators 
ESTIMATORS = { 
    "Extra trees": ExtraTreesRegressor(n_estimators=10, max_features=32, 
             random_state=0), 
    "K-nn": KNeighborsRegressor(), 
    "Linear regression": LinearRegression(), 
    "Ridge": RidgeCV(), 
} 

y_test_predict = dict() 
for name, estimator in ESTIMATORS.items(): 
    estimator.fit(X_train, y_train) 
    y_test_predict[name] = estimator.predict(X_test)

來源

2015-10-08 user1885116

線性迴歸不太可能對這個問題起作用，反而會提示神經網絡。 – par

@ user2662639問題是sklearn示範。神經網絡不是萬能的 - 在這個問題中只有300個訓練樣例，所以它們的性能不會有太大的差異（嘗試將一個未經訓練的「深度」網絡適用於300個示例，並看看會發生什麼，您的網絡結構將需要小得多有機會工作）。更好的解決方案是在輸出結構中施加平滑度，例如，與正則化矩陣分解，以及其他方法 – eqzx

你說得對。

每張圖片中有4096個像素。測試集中的每個輸出像素是該像素的訓練係數和來自測試集的2048個輸入像素的線性組合。

如果你看一下sklearn Linear Regression documentation，你會看到，在多目標迴歸係數是形狀（n_targets，n_features）（2048個目標，2048點的功能）

In [24]: ESTIMATORS['Linear regression'].coef_.shape Out[24]: (2048, 2048)

引擎蓋下，它叫scipy.linalg.lstsq，所以重要的是要注意係數之間沒有「信息共享」，因爲每個輸出都是輸入像素的所有2048個單獨的線性組合。

來源

2015-10-08 15:03:03 eqzx

蟒蛇 - 線性迴歸 - 圖像

回答

相關問題