所以,我有字符串的numpy的陣列,並且我想要計算每對使用該功能元件之間的成對編輯距離:從http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.spatial.distance.pdist.html蟒numpy的成對編輯距離
scipy.spatial.distance.pdist我的數組的示例如下:
>>> d[0:10]
array(['TTTTT', 'ATTTT', 'CTTTT', 'GTTTT', 'TATTT', 'AATTT', 'CATTT',
'GATTT', 'TCTTT', 'ACTTT'],
dtype='|S5')
但是,因爲它不具備「editdistance」選項,所以,我想給一個自定義的距離函數。我想這和我遇到了以下錯誤:
>>> import editdist
>>> import scipy
>>> import scipy.spatial
>>> scipy.spatial.distance.pdist(d[0:10], lambda u,v: editdist.distance(u,v))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/epd-7.3.2/lib/python2.7/site-packages/scipy/spatial/distance.py", line 1150, in pdist
[X] = _copy_arrays_if_base_present([_convert_to_double(X)])
File "/usr/local/epd-7.3.2/lib/python2.7/site-packages/scipy/spatial/distance.py", line 153, in _convert_to_double
X = np.double(X)
ValueError: could not convert string to float: TTTTT
看起來只是不適合字符串。您可能需要查看https://docs.python.org/2/library/difflib.html – Pavel
該錯誤行是'pdist'中的第二行。因此,在將字符串傳遞給'pdist'之前,您需要將字符串轉換爲某種編號。 'pdist'也想要一個2D數組。 – hpaulj