我正嘗試爲二項分類構建一個隨機森林分類器。有人可以解釋爲什麼我每次運行此程序時準確度得分都會有所變化分數在68% - 74%之間變化。此外,我嘗試調整參數,但我無法獲得超過74的準確度。對此的任何建議也將不勝感激。我嘗試使用GridSearchCV,但我只管理了一個體面的3點增加。隨機森林分類器
#import libraries
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn import preprocessing
#read data into pandas dataframe
df = pd.read_csv("data.csv")
#handle missing values
df = df.dropna(axis = 0, how = 'any')
#handle string-type data
le = preprocessing.LabelEncoder()
le.fit(['Male','Female'])
df.loc[:,'Sex'] = le.transform(df['Sex'])
#split into train and test data
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.8
train, test = df[df['is_train'] == True], df[df['is_train'] == False]
#make an array of columns
features = df.columns[:10]
#build the classifier
clf = RandomForestClassifier()
#train the classifier
y = train['Selector']
clf.fit(train[features], train['Selector'])
#test the classifier
clf.predict(test[features])
#calculate accuracy
accuracy_score(test['Selector'], clf.predict(test[features]))
accuracy_score(train['Selector'], clf.predict(train[features]))
鏈接數據集:https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset) – TheBeginner
爲了提高你的模型,我建議你使用合奏,也嘗試XGBoost。 –