2016-02-18 116 views
0

在蟒蛇2.7.6,matlablib,scikit學習0.17.0或更新,當我做在散點圖上多項式迴歸線,多項式曲線將是非常的混亂這樣的:凌亂散點圖迴歸線:Python的

enter image description here

的腳本是這樣的:它會讀取浮動兩列數據,並作出散點圖和迴歸

import pandas as pd 
import scipy.stats as stats 
import pylab 
import numpy as np 
import matplotlib.pyplot as plt 
import statsmodels.api as sm 
import pylab as pl 
import sklearn 
from sklearn import preprocessing 
from sklearn.cross_validation import train_test_split 
from sklearn import datasets, linear_model 
from sklearn.linear_model import LinearRegression 
from sklearn.preprocessing import PolynomialFeatures 
from sklearn.pipeline import make_pipeline 
from sklearn.linear_model import Ridge 

df=pd.read_csv("boston_real_estate_market_clean.csv") 

LSTAT = df['LSTAT'].as_matrix() 

LSTAT=LSTAT.reshape(LSTAT.shape[0], 1) 

MEDV=df['MEDV'].as_matrix() 

MEDV=MEDV.reshape(MEDV.shape[0], 1) 

# Train test set split 
X_train1, X_test1, y_train1, y_test1 =    train_test_split(LSTAT,MEDV,test_size=0.3,random_state=1) 

# Ploynomial Regression-nst order 

plt.scatter(X_test1, y_test1, s=10, alpha=0.3) 

for degree in [1,2,3,4,5]: 
    model = make_pipeline(PolynomialFeatures(degree), Ridge()) 
    model.fit(X_train1,y_train1) 
    y_plot = model.predict(X_test1) 
    plt.plot(X_test1, y_plot, label="degree %d" % degree 
      +'; $q^2$: %.2f' % model.score(X_train1, y_train1) 
      +'; $R^2$: %.2f' % model.score(X_test1, y_test1)) 


plt.legend(loc='upper right') 

plt.show() 

我想原因是因爲「X_test1,y_plot」沒有適當的排序?

X_test1是numpy的數組是這樣的:

[[ 5.49] 
[ 16.65] 
[ 17.09] 
.... 
[ 25.68] 
[ 24.39]] 

yplot是numpy的數組是這樣的:

[[ 29.78517812] 
[ 17.16759833] 
[ 16.86462359] 
[ 23.18680265] 
...[ 37.7631725 ]] 

我嘗試用這個排序:

[X_test1, y_plot] = zip(*sorted(zip(X_test1, y_plot), key=lambda y_plot: y_plot[0])) 

    plt.plot(X_test1, y_plot, label="degree %d" % degree 
       +'; $q^2$: %.2f' % model.score(X_train1, y_train1) 
       +'; $R^2$: %.2f' % model.score(X_test1, y_test1)) 

曲線現在看起來很正常,但結果很奇怪,並且帶有負R^2。

enter image description here

可以在任何大師告訴我,真正的問題是什麼或如何正確地在這裏進行排序?謝謝!

+0

這是因爲任何實數的平方應該是積極的......虛數特別怪異? –

+0

您是否曾嘗試使用'reverse = True'作爲'sorted'的參數來反轉排序?不知道它是否會奏效,但值得一試。 –

回答

1

雖然情節現在是正確的,但是在排序時弄亂了X_test1與y_test1的配對,因爲您忘記以同樣的方式對y_test1進行排序。 最好的解決方案是在分割後立即排序。然後y_plot,這是後來計算,會自動修正:(使用numpy的未經檢驗的這裏作爲例子NP)

X_train1, X_test1, y_train1, y_test1 =    train_test_split(LSTAT,MEDV,test_size=0.3,random_state=1) 

sorted_index = np.argsort(X_test1) 
X_test1 = X_test1[sorted_index] 
y_test1 = y_test1[sorted_index]