如何創建新列以存儲重複ID列的數據？

我有這個數據幀：如何創建新列以存儲重複ID列的數據？

ID key 
0 1 A 
1 1 B 
2 2 C 
3 3 D 
4 3 E 
5 3 E

我想創造更多的key列 - 作爲necessary-到存儲數據的時候有重複IDs

這是輸出的一個片段的key列：

ID key key2 
0 1 A  B # Note: ID#1 appeared twice in the dataframe, so the key value "B" 
       # associated with the duplicate ID will be stored in the new column "key2"

完整的輸出應該像下面這樣：

ID key key2 key3 
0 1 A  B NaN 
1 2 C NaN NaN 
2 3 D  E  E # The ID#3 has repeated three times. The key of      
         # of the second repeat "E" will be stored under the "key2" column 
         # and the third repeat "E" will be stored in the new column "key3"

任何建議或想法我應該如何解決這個問題？

感謝，

來源

2016-08-03 MEhsan

退房groupby和apply。他們各自的文檔是here和here。您可以unstack（docs）創建的MultiIndex的額外級別。

df.groupby('ID')['key'].apply(
    lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])]) 
).unstack(-1)

輸出

key_0 key_1 key_2 
ID     
1  A  B None 
2  C None None 
3  D  E  E

如果你想ID爲一列，你可以調用這個數據幀reset_index。

來源

2016-08-03 04:22:33 Alex

這是驚人的！是否有可能使代碼處理相同的數據幀，但使用附加列「AltterKey」，因此數據幀總共有3列（'ID'，'key'和'AlterKey'）。我應該如何修改代碼才能使其工作？ @Alex – MEhsan

我的意思是如何將'lambda'函數應用到新列'AlterKey'？謝謝，@Alex – MEhsan

您可以使用cumcount與pivot_table：

df['cols'] = 'key' + df.groupby('ID').cumcount().astype(str) 
print (df.pivot_table(index='ID', columns='cols', values='key', aggfunc=''.join)) 
cols key0 key1 key2 
ID     
1  A  B None 
2  C None None 
3  D  E  E

來源

2016-08-03 05:57:03 jezrael

如何創建新列以存儲重複ID列的數據？

回答

相關問題