我想將預測的數據添加回到我在Python中的原始數據集中。我想我應該使用Pandas和ASSIGN以及pd.DataFrame,但是在閱讀完所有文檔後,我不知道該如何編寫這個代碼(對不起,我是新手,剛開始學習編碼)。我已經在下面編寫了我的代碼,只需要代碼的幫助即可將我的預測添加回數據集。謝謝您的幫助!將列添加到python中的數據集中
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,
random_state = 0)
# Feature Scaling X_train and X_test
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Feature scaling the all independent variables used to build the model
whole_dataset = sc.transform(X)
# Fitting classifier to the Training set
# Create your Naive Bayes here
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict_proba(X_test)
# Predicting the results for the whole dataset
y_pred2 = classifier.predict_proba(whole_dataset)
# Add y_pred2 predictions back to the dataset
???
我想現在看着你想要做的事情,你誤解了正在發生的事情。您已將數據集分成一列火車和測試數據。然後,您在訓練數據集上進行訓練,然後對測試數據進行擬合。然後,您嘗試將原始數據集分配到所有行。例如,你在數據集中有400行,但在y_pred中只有100行,所以你不能分配不同長度的行。你想要做的是'y_pred = classifier.predict_proba(X)',然後將其分配給:'dataset ['predict_class_1'],dataset ['predict_class_2'] = y_pred [:,0],y_pred [:,1] ' – EdChum
非常感謝,我會嘗試一下! :)我將代碼稍微改了一點,現在可以預測400行。我無法在這裏上傳數據文件,但可以在https://www.superdatascience.com/machine-learning/第18節naive bayes zip文件中下載。該csv文件被稱爲Social_Network_Ads.csv。我希望我能得到它的工作:) – zipline86
@EdChum它的工作!謝謝! – zipline86