2017-10-17 208 views
0

我想重現一個教程看到 here問題與機器學習scikit在Python學習

一切工作完美,直到我用我的訓練集添加.fit方法。

這裏是我的代碼示例:

# TRAINING PART 

train_dir = 'pdf/learning_set' 
dictionary = make_dic(train_dir) 

train_labels = np.zeros(20) 
train_labels[17:20] = 1 
train_matrix = extract_features(train_dir) 
model1 = MultinomialNB() 
model1.fit(train_matrix, train_labels) 


# TESTING PART 

test_dir = 'pdf/testing_set' 
test_matrix = extract_features(test_dir) 
test_labels = np.zeros(8) 
test_labels[4:7] = 1 
result1 = model1.predict(test_matrix) 
print(confusion_matrix(test_labels, result1)) 

這裏是我的回溯:

Traceback (most recent call last): 
File "ML.py", line 65, in <module> 
model1.fit(train_matrix, train_labels) 
File "/usr/local/lib/python3.6/site-packages/sklearn/naive_bayes.py", 
line 579, in fit 
X, y = check_X_y(X, y, 'csr') 
File "/usr/local/lib/python3.6/site- 
packages/sklearn/utils/validation.py", line 552, in check_X_y 
check_consistent_length(X, y) 
File "/usr/local/lib/python3.6/site- 
packages/sklearn/utils/validation.py", line 173, in 
check_consistent_length 
" samples: %r" % [int(l) for l in lengths]) 
ValueError: Found input variables with inconsistent numbers of 
samples: [23, 20] 

我想知道我怎樣才能解決這個問題呢? 我正在使用python 3.6在Ubuntu 16.04上工作。

回答

1

ValueError異常:與 樣本不一致數實測值輸入變量:[23,20]

這意味着你有23個訓練向量(train_matrix具有23行) 但只有20個訓練標籤(train_labels是陣列20個值)

變化train_labels = np.zeros(20)train_labels = np.zeros(23) ,它應該工作。

+0

非常感謝,它工作完美!這是一個愚蠢的錯誤啊哈 –