0
在蟒蛇2.7.6,matlablib,scikit學習0.17.0或更新,當我做在散點圖上多項式迴歸線,多項式曲線將是非常的混亂這樣的:凌亂散點圖迴歸線:Python的
的腳本是這樣的:它會讀取浮動兩列數據,並作出散點圖和迴歸
import pandas as pd
import scipy.stats as stats
import pylab
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pylab as pl
import sklearn
from sklearn import preprocessing
from sklearn.cross_validation import train_test_split
from sklearn import datasets, linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
df=pd.read_csv("boston_real_estate_market_clean.csv")
LSTAT = df['LSTAT'].as_matrix()
LSTAT=LSTAT.reshape(LSTAT.shape[0], 1)
MEDV=df['MEDV'].as_matrix()
MEDV=MEDV.reshape(MEDV.shape[0], 1)
# Train test set split
X_train1, X_test1, y_train1, y_test1 = train_test_split(LSTAT,MEDV,test_size=0.3,random_state=1)
# Ploynomial Regression-nst order
plt.scatter(X_test1, y_test1, s=10, alpha=0.3)
for degree in [1,2,3,4,5]:
model = make_pipeline(PolynomialFeatures(degree), Ridge())
model.fit(X_train1,y_train1)
y_plot = model.predict(X_test1)
plt.plot(X_test1, y_plot, label="degree %d" % degree
+'; $q^2$: %.2f' % model.score(X_train1, y_train1)
+'; $R^2$: %.2f' % model.score(X_test1, y_test1))
plt.legend(loc='upper right')
plt.show()
我想原因是因爲「X_test1,y_plot」沒有適當的排序?
X_test1是numpy的數組是這樣的:
[[ 5.49]
[ 16.65]
[ 17.09]
....
[ 25.68]
[ 24.39]]
yplot是numpy的數組是這樣的:
[[ 29.78517812]
[ 17.16759833]
[ 16.86462359]
[ 23.18680265]
...[ 37.7631725 ]]
我嘗試用這個排序:
[X_test1, y_plot] = zip(*sorted(zip(X_test1, y_plot), key=lambda y_plot: y_plot[0]))
plt.plot(X_test1, y_plot, label="degree %d" % degree
+'; $q^2$: %.2f' % model.score(X_train1, y_train1)
+'; $R^2$: %.2f' % model.score(X_test1, y_test1))
曲線現在看起來很正常,但結果很奇怪,並且帶有負R^2。
可以在任何大師告訴我,真正的問題是什麼或如何正確地在這裏進行排序?謝謝!
這是因爲任何實數的平方應該是積極的......虛數特別怪異? –
您是否曾嘗試使用'reverse = True'作爲'sorted'的參數來反轉排序?不知道它是否會奏效,但值得一試。 –