2015-04-23 92 views
0

所以我有一些重複索引的數據和我想要的列。示例大熊貓樞軸,並創建額外的列重複

df = pd.DataFrame({ 
       "id":[1,1,1,2,2,3,3,3], 
       "contact_type":["email","phone","phone","email","mobile","email","phone","mobile"], 
       "contact":["[email protected]","123","456","[email protected]","78432","[email protected]","12","12"] 
       }) 

我想要做的就是讓每個ID都是單行。我的理想輸出將是

ID email  phone  phone.1 mobile 
1  [email protected]  123  456  NaN 
2  [email protected] NaN  NaN  78432 
3  [email protected]  12   NaN  12 

嘗試使用df.pivot(「ID」,「CONTACT_TYPE」,「接觸」)給我一個錯誤「索引包含重複的條目,不能重塑」。問題是,ID 1在contact_type中有2個電話,似乎並不如此。那麼還有另一種方法可以將數據轉換爲這種格式嗎?

回答

0

我想你必須一塊一塊地組裝最終的數據框(pd.concat),因爲你事先並不知道ID最多可能有多少個不同的電話號碼。假設每個ID最多隻有1個電子郵件或手機號碼:

In [130]: 

df_mail = df.ix[df.contact_type=='email', ['contact', 'id']].set_index('id') 
In [131]: 

df_mobile = df.ix[df.contact_type=='mobile', ['contact', 'id']].set_index('id') 
In [132]: 

df_phone = df.ix[df.contact_type=='phone', ['contact', 'id']] 
In [133]: 
# make a columns stores 'phone0', 'phone1' and so on: 
df_phone['field'] = 'Phone' + df_phone.groupby('id').transform(lambda x: range(len(x))).contact.map(str) 
In [134]: 

df_phone = df_phone.pivot('id', 'field', 'contact') 
In [135]: 

df_mail.columns = ['Email'] 
df_mobile.columns = ['Mobile'] 
In [136]: 

print pd.concat((df_mail, df_phone, df_mobile), axis=1) 
     Email Phone0 Phone1 Mobile 
id        
1 [email protected] 123 456 NaN 
2 [email protected] NaN NaN 78432 
3 [email protected]  12 NaN  12