性能與大熊貓

循環的我原來問a question在數據科學界：性能與大熊貓

我有格式化如下表所示的表格：
Feature amount ID 
Feat1 2  1 
Feat2 0  1 
Feat3 0  1 
Feat4 1  1 
Feat2 2  2 
Feat4 0  2 
Feat3 0  2 
Feat6 1  2 
比方說，我有200點不同的ID。我想將所有不同的特徵轉換爲變量並將其轉換爲觀察值，因此我將具有相同ID的行合併爲一行。例如，
Feat1 Feat2 Feat3 Feat4 Feat5 Feat6 ID 
    2  0  0  1 NA NA 1  
NA  2  0  0 NA 1 2  
是否有一種很好的方法可以在Python（熊貓）或R？

這是我得到了答案：

newdata = pd.DataFrame(columns=['ID', 'Location', 'Feat1', 'Feat2', 'Feat3', 'Feat4', 'Feat5', 'Feat6']) 
grouped = data.groupby(['ID', 'Location']) 

for index, (group_name, d) in enumerate(grouped): 
    newdata.loc[index, 'ID'] = group_name[0] 
    newdata.loc[index, 'Location'] = group_name[1] 
    for feature, amount in zip(d['Feature'], d['amount']): 
     newdata.loc[index, feature] = amount

更多谷歌搜索後，我發現這個question的回答說：

所以儘量避免Python loop for i, row in enumerate(...)完全

我想知道，關於我原來的問題，有沒有更高效的方法？

來源

2016-03-04 chchannn

我相信這是你所追求的。

>>> df.pivot_table(values='amount', index='ID', columns='Feature') 
Feature Feat1 Feat2 Feat3 Feat4 Feat6 
ID           
1   2  0  0  1 NaN 
2   NaN  2  0  0  1

根據您的數據和需要，存在變化。例如：

>>> df.pivot_table(values='amount', index='ID', columns='Feature', 
        aggfunc=np.sum, fill_value=0) 
Feature Feat1 Feat2 Feat3 Feat4 Feat6 
ID           
1   2  0  0  1  0 
2   0  2  0  0  1

來源

2016-03-04 02:03:36 Alexander

感謝您的回覆。我刪除了位置列，這不是這個問題的一個因素。 – chchannn

工作！感謝熊貓！ – chchannn

性能與大熊貓

回答

相關問題