1
我對數據邏輯迴歸和樸素貝葉斯運行了兩種不同的分類算法,但即使我改變訓練和測試數據比率,它也給了我相同的精度。以下是我正在使用的代碼如何使用sklearn來檢查分類
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
df = pd.read_csv('Speed Dating.csv', encoding = 'latin-1')
X = pd.DataFrame()
X['d_age'] = df ['d_age']
X['match'] = df ['match']
X['importance_same_religion'] = df ['importance_same_religion']
X['importance_same_race'] = df ['importance_same_race']
X['diff_partner_rating'] = df ['diff_partner_rating']
# Drop NAs
X = X.dropna(axis=0)
# Categorical variable Match [Yes, No]
y = X['match']
# Drop y from X
X = X.drop(['match'], axis=1)
# Transformation
scalar = StandardScaler()
X = scalar.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Logistic Regression
model = LogisticRegression(penalty='l2', C=1)
model.fit(X_train, y_train)
print('Accuracy Score with Logistic Regression: ', accuracy_score(y_test, model.predict(X_test)))
#Naive Bayes
model_2 = GaussianNB()
model_2.fit(X_train, y_train)
print('Accuracy Score with Naive Bayes: ', accuracy_score(y_test, model_2.predict(X_test)))
print(model_2.predict(X_test))
是否有可能每次精度相同?
是否因爲你的輸入,'X'是'numpy數組'和目標,'y'是'pandas series'對象,同時調用'train_test_split'類型不匹配,對模型的精度沒有影響?你可以使用'y.values'將'y'轉換爲一個數組,然後檢查是否這是問題。 –
這是一個問題,但我已經通過將所有內容轉換爲數據幀來解決問題,但我仍然獲得了類似的準確性。事實上,我發現例如精度爲80%,因爲80%的測試數據包含零,所以實際上模型根本不起作用。 – muazfaiz