從Scikit（Python）的管道中間檢索功能

我使用的管道非常相似，給in this example之一：從Scikit（Python）的管道中間檢索功能

>>> text_clf = Pipeline([('vect', CountVectorizer()), 
...      ('tfidf', TfidfTransformer()), 
...      ('clf', MultinomialNB()), 
... ])

在我使用GridSearchCV找到了一個參數格的最佳估計。

但是，我想從方法從CountVectorizer()獲得我的訓練集的列名稱。這可能沒有在管道外執行CountVectorizer()？

來源

2015-10-12 Tanguy

使用get_params()函數，您可以訪問管道的各個部分及其各自的內部參數。這裏是（對我來說）

CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict', 
    dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content', 
    lowercase=True, max_df=1.0, max_features=None, min_df=1, 
    ngram_range=(1, 1), preprocessor=None, stop_words=None, 
    strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b', 
    tokenizer=None, vocabulary=None)

我還沒有安裝在這個例子中，管道的任何數據，因此調用get_feature_names()此時會返回一個錯誤訪問'vect'

text_clf = Pipeline([('vect', CountVectorizer()), 
        ('tfidf', TfidfTransformer()), 
        ('clf', MultinomialNB())] 
print text_clf.get_params()['vect']

收益的例子。

來源

2015-10-12 18:47:01 NBartley

僅供參考

The estimators of a pipeline are stored as a list in the steps attribute: 
>>> 

>>> clf.steps[0] 
('reduce_dim', PCA(copy=True, n_components=None, whiten=False)) 

and as a dict in named_steps: 
>>> 

>>> clf.named_steps['reduce_dim'] 
PCA(copy=True, n_components=None, whiten=False)

從http://scikit-learn.org/stable/modules/pipeline.html

來源

2015-12-31 15:33:05 AbtPst

從Scikit（Python）的管道中間檢索功能

回答

相關問題