0
我對Python和SKLearn相當陌生。我試圖做一個簡單的分類器,但我遇到了一個問題。我一直在關注一些不同的教程,但在嘗試使用.fit
方法時出現錯誤。我是這個概念的新手,已經嘗試過這些文檔,但發現很難理解,任何人都可以幫助我解決錯誤,或者指引我朝着正確的方向發展。Python分類器Sklearn
我的錯誤背後的想法是,值超出了範圍爲D型,因爲我已經改變了所有的遺漏值或NaN值,但錯誤依然出現
代碼
def main():
setup_files()
imputer = Imputer()
#the training data minus id and type:
t_num_data = load_csv(training_set_file_path, range(1, 17))
t_num_data_imputed = imputer.fit_transform(t_num_data)
print(t_num_data_imputed)
#the training type column
t_type_col = load_csv(training_set_file_path, 17, dtype=np.dtype((str, 5)))
#the query data minus id and type:
q_data = load_csv(queries_file_path, range(1, 17))
#the query id column
q_id = load_csv(queries_file_path, 0, dtype=np.dtype((str, 10)))
#fit data above to DTC and predict import
model = tree.DecisionTreeClassifier(criterion='entropy')
model.fit_transform(t_num_data, t_type_col)
predictions = model.predict(q_data)
#output the predictions:
with open(solutions_file_path, 'w') as f:
for i in range(len(predictions)):
f.write("{},{}\n".format(q_id[i], predictions[i]))
#fit data above to DTC and predict import
model = tree.DecisionTreeClassifier(criterion='entropy')
model.fit(t_num_data, t_type_col)
predictions = model.predict(q_data)
#output the predictions:
with open(solutions_file_path, 'w') as f:
for i in range(len(predictions)):
f.write("{},{}\n".format(q_id[i], predictions[i]))
錯誤
Traceback (most recent call last):
File "/Users/Rory/Desktop/classifier.py", line 71, in <module>
main()
File "/Users/Rory/Desktop/classifier.py", line 60, in main
model.fit_transform(t_num_data, t_type_col)
File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/base.py", line 458, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/tree/tree.py", line 154, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 398, in check_array
_assert_all_finite(array)
File "/Users/Rory/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
錯誤說明了這一切。你的't_num_data'有inf或nan值。嘗試打印最小/最大 –
,是否有一個簡單的修復這個或做或它是否在數據本身? – JJSmith
@imaluengo當我打印最大值和最小值時,我得到了兩個 – JJSmith