我有以下代碼來測試一些sklearn Python庫中最流行的ML算法:邏輯迴歸:未知的標籤類型:「連續」使用sklearn在python
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array([3.4, 7.5, 4.5, 1.6])
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
的前兩部作品不錯,但我得到了在LogisticRegression
通話以下錯誤:
[email protected]:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
輸入數據是一樣的,在之前的電話,所以這到底是怎麼回事呢?
順便說一下,爲什麼在LinearRegression()
和SVR()
算法(15.72 vs 3.95)
的第一個預測中存在巨大差異?
謝謝!所以我必須將'2.3'轉換爲'23'等等,不是嗎?有一種使用numpy或pandas進行轉換的優雅方法? – harrison4
但是,在這個例子中,輸入數據使用LogisticRegression函數具有浮點數:http://machinelearningmastery.com/compare-machine-learning-algorithms-python-scikit-learn/ ...並且它工作正常。爲什麼? – harrison4
輸入可以是浮點數,但輸出需要是分類的,即int。在這個例子中,第8列只有0或1。 通常情況下,您可以使用分類標籤,例如['紅','大','生病'],你需要將其轉換爲數值。請嘗試http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features或http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html –