2016-06-09 50 views
3

在instanciating一個MultiLabelBinarizer後,我需要它的inverse_transform方法爲我在別處構建的矩陣。 不幸的是,sklearn - 無法立即調用MultiLabelBinarizer的inverse_transform

import numpy as np 
from sklearn.preprocessing import MultiLabelBinarizer 
mlb = MultiLabelBinarizer(classes=['a', 'b', 'c']) 

A = np.array([[1, 0, 0], [1, 0, 1], [0, 1, 0], [1, 1, 1]]) 
y = mlb.inverse_transform(A) 

產生AttributeError: 'MultiLabelBinarizer' object has no attribute 'classes_'

我注意到,如果我添加的mlb的instanciation後,這條線,

mlb.fit_transform([(c,) for c in ['a', 'b', 'c']]) 

錯誤消失。我猜這是因爲fit_transform設置了classes_屬性的值,但我期望它在實例化時完成,因爲我提供了一個classes參數。

我使用sklearn版本0.17.1和python 2.7.6。 我做錯了什麼?

回答

3

如果你想設置的MultiLabelBinarizer實例中的屬性classes_,你也可以做一個快速的黑客攻擊這樣的:

mlb = MultiLabelBinarizer().fit(['a', 'b', 'c']) 

因爲像marmouset說,只有fitfit_transorm似乎符合classes_屬性。此外,scikitlearn.org http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html的文檔明確指出方法fit可以返回MultiLabelBinarizer的實例。

def fit(self, y): 
    """Fit the label sets binarizer, storing `classes_` 
    Parameters 
    ---------- 
    y : iterable of iterables 
     A set of labels (any orderable and hashable object) for each 
     sample. If the `classes` parameter is set, `y` will not be 
     iterated. 
    Returns 
    ------- 
    self : returns this MultiLabelBinarizer instance 
    """ 
1

它看起來像https://github.com/scikit-learn/scikit-learn/blob/51a765a/sklearn/preprocessing/label.py#L636那樣實現,.fit是定義classes_屬性的唯一方法。 class_沒有被定義爲構造函數中的類的副本,並且考慮到註釋中給出的定義,它並不是本意的;你可以警告作者。

class MultiLabelBinarizer(BaseEstimator, TransformerMixin): 
    """Transform between iterable of iterables and a multilabel format 
    Although a list of sets or tuples is a very intuitive format for multilabel 
    data, it is unwieldy to process. This transformer converts between this 
    intuitive format and the supported multilabel format: a (samples x classes) 
    binary matrix indicating the presence of a class label. 
    Parameters 
    ---------- 
    classes : array-like of shape [n_classes] (optional) 
     Indicates an ordering for the class labels 
    sparse_output : boolean (default: False), 
     Set to true if output binary array is desired in CSR sparse format 
    Attributes 
    ---------- 
    classes_ : array of labels 
     A copy of the `classes` parameter where provided, 
     or otherwise, the sorted set of classes found when fitting.