在scikit學習,你得到的工具train_test_split
from sklearn.cross_validation import train_test_split
from sklearn import datasets
# Use Age and Weight to predict a value for the food someone chooses
X_train, X_test, y_train, y_test = train_test_split(table['Age', 'Weight'],
table['Food Choice'],
test_size=0.25)
# Another example using the sklearn pre-loaded datasets:
iris = datasets.load_iris()
X_iris, y_iris = iris.data, iris.target
X, y = X_iris[:, :2], y_iris
X_train, X_test, y_train, y_test = train_test_split(X, y)
這打破了以
- 輸入的數據進行訓練
- 輸入,用於評估數據
- 輸出爲培訓數據
- 輸出評估數據
。您還可以添加一個關鍵字參數:test_size = 0.25改變用於訓練的數據的百分比和測試
要拆分單一數據集,你可以使用這樣的呼籲得到40%的測試數據:
>>> data = np.arange(700).reshape((100, 7))
>>> training, testing = train_test_split(data, test_size=0.4)
>>> print len(data)
100
>>> print len(training)
60
>>> print len(testing)
40
這個函數是否理解它應該根據目標/標籤變量拆分數據?它不是寫在文檔中的任何地方。 – poiuytrez 2014-10-27 13:08:07
我增加了另一個例子,您明確選擇變量和目標 – 2014-10-27 13:12:42
...另一個隨機將輸入「數據」分成兩個數組 - 60:40 – 2014-10-27 13:23:00