OneHotEncoder後未轉換

我正在使用sklearn的OneHotEncoder，但想要不轉換我的數據。任何想法如何做到這一點？OneHotEncoder後未轉換

>>> from sklearn.preprocessing import OneHotEncoder 
>>> enc = OneHotEncoder() 
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]) 
>>> enc.n_values_ 
array([2, 3, 4]) 
>>> enc.feature_indices_ 
array([0, 2, 5, 9]) 
>>> enc.transform([[0, 1, 1]]).toarray() 
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])

，但我希望能夠做到以下幾點：

>>> enc.untransform(array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])) 
[[0, 1, 1]]

我怎麼會去這樣做呢？

對於上下文，我已經構建了一個神經網絡，它學習一個熱門的編碼空間，並且想要現在使用nn來進行真實的預測，這需要使用原始數據格式。

來源

2016-06-08 kmace

我注意到sklearn.feature_extraction.DictVectorizer有一個inverse_transform方法。 – kmace

剛發現這個答案，它非常詳盡，但它可以幫助你http://stackoverflow.com/questions/22548731/how-to-reverse-sklearn-onehotencoder-transform-to-recover-original-data –

用於反轉的單個熱編碼項
見：https://stackoverflow.com/a/39686443/7671913

from sklearn.preprocessing import OneHotEncoder 
import numpy as np 

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6]) 

ohe = OneHotEncoder() 
encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise 

decoded = encoded.dot(ohe.active_features_).astype(int) 
assert np.allclose(orig, decoded)

用於反轉的一個熱編碼項的數組看到（如在註釋中規定）
見： How to reverse sklearn.OneHotEncoder transform to recover original data?

鑑於sklearn.OneHotEncoder實例稱爲ohc，編碼數據（scipy.sp arse.csr_matrix）從ohc.fit_transform或ohc.transform調出，而原始數據的形狀輸出（N_SAMPLES次，n_feature），回收到具有原始數據X：

recovered_X = np.array([ohc.active_features_[col] for col in out.sorted_indices().indices]) 
      .reshape(n_samples, n_features) - ohc.feature_indices_[:-1]

來源

2018-01-22 15:28:09 bmjrowe

OneHotEncoder後未轉換

回答

相關問題