我試圖用非數字數據訓練KNeighborClassifier
,但我提供的自定義度量允許計算樣本之間的相似度分數。帶有非數字數據的KNeighborClassifier失敗
from sklearn.neighbors import KNeighborsClassifier
#Compute the "ASCII" distance:
def my_metric(a,b):
return ord(a)-ord(b)
#Samples and labels
X = [["a"],["b"], ["c"],["m"], ["z"]]
#S=Start of the alphabet, M=Middle, E=end
y = ["S", "S", "S", "M", "E"]
model = KNeighborsClassifier(metric=my_metric)
model.fit(X,y)
X_test = [["e"],["f"],["w"]]
y_test = [["S"],["M"],["E"]]
model.score(X_test, y_test)
我得到以下錯誤:
Traceback (most recent call last):
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-20-e339c96eea22>", line 1, in <module>
model.score(X_test, y_test)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/base.py", line 350, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/neighbors/classification.py", line 145, in predict
neigh_dist, neigh_ind = self.kneighbors(X)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/neighbors/base.py", line 361, in kneighbors
**self.effective_metric_params_)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 1247, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 1090, in _parallel_pairwise
return func(X, Y, **kwds)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 1104, in _pairwise_callable
X, Y = check_pairwise_arrays(X, Y)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/metrics/pairwise.py", line 110, in check_pairwise_arrays
warn_on_dtype=warn_on_dtype, estimator=estimator)
File "/home/marcofavorito/virtualenvs/nlp/lib/python3.5/site-packages/sklearn/utils/validation.py", line 402, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'e'
我想我可以很輕鬆地實現算法,但沒有sklearn
分類的所有功能。我錯過了一些選擇?或者,如果在我不將樣本翻譯成浮標之前,我無法訓練模型?
N.B.我知道這個問題可以通過把數字而不是字符來解決。但是我需要解決另一個處理非數字數據的問題,並且我無法找到一個簡單的浮點映射,如前所述。
我知道我要離開一個重要的一點是!感謝這個信息。在度量屬性上。 :) –