如何從字符串列中生成Categorical的熊貓DataFrame列？

我可以在熊貓字符串列轉換爲範疇，但是當我試圖插入它作爲一個新的數據框柱似乎被轉換右後衛STR系列：如何從字符串列中生成Categorical的熊貓DataFrame列？

train['LocationNFactor'] = pd.Categorical.from_array(train['LocationNormalized']) 

>>> type(pd.Categorical.from_array(train['LocationNormalized'])) 
<class 'pandas.core.categorical.Categorical'> 
# however it got converted back to... 
>>> type(train['LocationNFactor'][2]) 
<type 'str'> 
>>> train['LocationNFactor'][2] 
'Hampshire'

猜測這是因爲直言沒有按」 t映射到任何numpy dtype;所以我必須將其轉換爲某種int類型，從而失去因子標籤< - >關聯關係？什麼是最優雅的解決方法來存儲水平< - >標籤關聯並保留轉換能力？（只是存儲像here一個字典，並手動在需要時轉換？）我想Categorical is still not a first-class datatype for DataFrame，不像R.

（使用熊貓0.10.1，numpy的1.6.2，2.7.3蟒 - 最新版本的MacPorts一切）。

來源

2013-03-12 smci

唯一的解決辦法大熊貓爲前0.15我發現如下：

列必須被轉換成一個明確的分類，但numpy的將立即強制該水平恢復INT，失去因子信息
所以因子存儲在一個全局變量數據幀

外。

train_LocationNFactor = pd.Categorical.from_array(train['LocationNormalized']) # default order: alphabetical 

train['LocationNFactor'] = train_LocationNFactor.labels # insert in dataframe

[更新：熊貓0.15+ added decent support for Categorical]

來源

2013-08-04 06:19:42 smci

標籤< - >等級存儲在索引對象中。

要的整數數組轉換爲字符串數組：索引[integer_array]
要轉換的字符串數組爲整數數組：index.get_indexer（string_array）

下面是一些exampe：

In [56]: 

c = pd.Categorical.from_array(['a', 'b', 'c', 'd', 'e']) 

idx = c.levels 

In [57]: 

idx[[1,2,1,2,3]] 

Out[57]: 

Index([b, c, b, c, d], dtype=object) 

In [58]: 

idx.get_indexer(["a","c","d","e","a"]) 

Out[58]: 

array([0, 2, 3, 4, 0])

來源

2013-03-12 12:49:15 HYRY

我知道，但這裏的問題是，這一切又轟出回來時，我們分配到一個數據幀列海峽，就像我表明：'火車[「LocationNFactor」 ] = pd.Categorical ...' – smci 2013-03-12 19:47:59

如何從字符串列中生成Categorical的熊貓DataFrame列？

回答

相關問題