scikit中的OneHotEncoder混淆學習

在Python 2.7中使用（miniconda解釋器）。下面的例子混淆了關於OneHotEncoder，困惑爲什麼enc.n_values_輸出是[2, 3, 4]？如果有人能夠幫助澄清，那將會很棒。scikit中的OneHotEncoder混淆學習

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

>>> from sklearn.preprocessing import OneHotEncoder 
>>> enc = OneHotEncoder() 
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) 
OneHotEncoder(categorical_features='all', dtype=<... 'float'>, 
     handle_unknown='error', n_values='auto', sparse=True) 
>>> enc.n_values_ 
array([2, 3, 4]) 
>>> enc.feature_indices_ 
array([0, 2, 5, 9]) 
>>> enc.transform([[0, 1, 1]]).toarray() 
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])

問候，林

來源

2016-08-22 Lin Ma

n_values是每個特徵值的數量。

在這個例子中，

（X的形狀爲[N_SAMPLES次，n_feature]）

對於第一特徵中，有2個值：0，1;

對於第二個特徵中，有3個值：0，1，2

對於第三特徵中，有4個值：0，1，2，3

因此，enc.n_values_是[2, 3, 4]。

來源

2016-08-22 04:12:28 yangjie

謝謝楊潔，所以3個樣本是'[0,1,0,1]'，'[0,1,2,0]'和'[3,0,1,2]'？ –

也對'[n_samples，n_feature]'感到困惑，我認爲它是'n_samples'行和'n_feature'列，但它似乎並非如此，如果你能清晰起來，那將會很棒。 :) –

它是'n_samples'行和'n_feature'列。 X中有4個樣本，每個樣本有3個特徵。 – yangjie

scikit中的OneHotEncoder混淆學習

回答

相關問題