我希望能夠獲取字典(記錄)的列表,其中某些列的值列表爲單元格的值。下面是一個例子Python - 字符串列表中的特徵散列列表字符串
[{'fruit': 'apple', 'age': 27}, {'fruit':['apple', 'banana'], 'age': 32}]
我怎麼能借此輸入並對其進行功能散列(在我的數據集我有成千上萬的列)。目前我正在使用一種熱門編碼,但這似乎消耗了很多內存(比我的系統上的更多)。
我試圖把我的數據集作爲上面,就有了一個錯誤:
x__ = h.transform(data)
Traceback (most recent call last):
File "<ipython-input-14-db4adc5ec623>", line 1, in <module>
x__ = h.transform(data)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/hashing.py", line 142, in transform
_hashing.transform(raw_X, self.n_features, self.dtype)
File "sklearn/feature_extraction/_hashing.pyx", line 52, in sklearn.feature_extraction._hashing.transform (sklearn/feature_extraction/_hashing.c:2103)
TypeError: a float is required
我也試圖把它變成一個數據幀,並把它傳遞給散列器:
x__ = h.transform(x_y_dataframe)
Traceback (most recent call last):
File "<ipython-input-15-109e7f8018f3>", line 1, in <module>
x__ = h.transform(x_y_dataframe)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/hashing.py", line 142, in transform
_hashing.transform(raw_X, self.n_features, self.dtype)
File "sklearn/feature_extraction/_hashing.pyx", line 46, in sklearn.feature_extraction._hashing.transform (sklearn/feature_extraction/_hashing.c:1928)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/hashing.py", line 138, in <genexpr>
raw_X = (_iteritems(d) for d in raw_X)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/hashing.py", line 15, in _iteritems
return d.iteritems() if hasattr(d, "iteritems") else d.items()
AttributeError: 'unicode' object has no attribute 'items'
任何想法如何我可以用熊貓或sklearn來實現這個嗎?或者,也許我可以一次構建幾千行的虛擬變量?
這裏是我如何得到我的使用大熊貓虛擬變量:
def one_hot_encode(categorical_labels):
res = []
tmp = None
for col in categorical_labels:
v = x[col].astype(str).str.strip('[]').str.get_dummies(', ')#cant set a prefix
if len(res) == 2:
tmp = pandas.concat(res, axis=1)
del res
res = []
res.append(tmp)
del tmp
tmp = None
else:
res.append(v)
result = pandas.concat(res, axis=1)
return result
您可以將列表到元組,這是哈希的。 – IanS