1
我想比較不同大小的集合S的分區/聚類(P1和P2)。例如:python scikit-learn實現互信息不適用於不同大小的分區
S = [1, 2, 3, 4, 5, 6]
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
從我讀互信息可能是一種方法,它是在實施scikit學習。從定義,它並沒有說明該分區必須是同樣大小的(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html).l
然而,當我試圖實現我的代碼,我得到的錯誤,由於不同的尺寸。
from sklearn import metrics
P1 = [[1, 2], [3,4], [5,6]]
P2 = [ [1,2,3,4], [5, 6]]
metrics.mutual_info_score(P1,P2)
ValueErrorTraceback (most recent call last)
<ipython-input-183-d5cb8d32ce7d> in <module>()
2 P2 = [ [1,2,3,4], [5, 6]]
3
----> 4 metrics.mutual_info_score(P1,P2)
/home/user/anaconda2/lib/python2.7/site-packages/sklearn/metrics/cluster/supervised.pyc in mutual_info_score(labels_true, labels_pred, contingency)
556 """
557 if contingency is None:
--> 558 labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
559 contingency = contingency_matrix(labels_true, labels_pred)
560 contingency = np.array(contingency, dtype='float')
/home/user/anaconda2/lib/python2.7/site-packages/sklearn/metrics/cluster/supervised.pyc in check_clusterings(labels_true, labels_pred)
34 if labels_true.ndim != 1:
35 raise ValueError(
---> 36 "labels_true must be 1D: shape is %r" % (labels_true.shape,))
37 if labels_pred.ndim != 1:
38 raise ValueError(
ValueError: labels_true must be 1D: shape is (3, 2)
有一種形式使用scikit-learn和互信息來看看這個分區有多接近?否則,是否有一個不使用互信息?