2017-06-22 100 views
2

我需要根據參考字典重命名和重複我的數據幀列。下面我創建了一個虛擬數據幀:基於參考字典的熊貓重複數據幀列

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']} 
df= pd.DataFrame(rawdata) 
df.set_index('id') 

     entity entity2 entity3 
id        
json present present absent 
molly absent present absent 
tina absent present absent 
jake present absent present 
molly present absent absent 

現在我有下面的例子字典:

ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']} 

我需要基於字典的值到現在取代列名,如果列有一個以上值應比列重複。以下是我所希望的數據框:

 entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id      
json present  present  present  absent  absent absent 
molly absent  present  present  absent  absent absent 
tina absent  present  present  absent  absent absent 
jake present  absent  absent  present  present present 
molly present  absent  absent  absent  absent absent 
+0

謝謝你以外我的swer。隨意投票的答案。 – piRSquared

+0

謝謝piRSquared。你總是有最驚人的解決方案。 – Rtut

回答

1

選項1
在字典解析

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1) 

     entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1 
id                     
json  present  present  absent  absent  absent  present 
molly  present  present  absent  absent  absent  absent 
tina  present  present  absent  absent  absent  absent 
jake  absent  absent  present  present  present  present 
molly  absent  absent  absent  absent  absent  present 

選項2
切片數據框使用pd.concat和重命名列

repeats = df.columns.map(lambda x: len(ref_dict[x])) 
d1 = df.reindex_axis(df.columns.repeat(repeats), 1) 
d1.columns = df.columns.map(ref_dict.get).values.sum() 
d1 

     entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id                     
json  present  present  present  absent  absent  absent 
molly  absent  present  present  absent  absent  absent 
tina  absent  present  present  absent  absent  absent 
jake  present  absent  absent  present  present  present 
molly  present  absent  absent  absent  absent  absent 
0

對於df每一列,你可以尋找新的列數ref_dict創造new column爲他們最後,刪除舊的。您可以嘗試以下操作:

# for key, value in ref_dict where old column and new columns are 
for old_column,new_columns in ref_dict.items(): 
    for new_column in new_columns: # for each new_column in new_columns defined 
     df[new_column] = df[old_column] # the content remains same as old column 
    del df[old_column] # now remove the old column 
0

你可以簡單地循環:

rawdata= {'id':['json','molly','tina','jake','molly'], 
      'entity':['present','absent','absent','present','present'], 
      'entity2':['present','present','present','absent','absent'], 
      'entity3':['absent','absent','absent','present','absent']} 
df= pd.DataFrame(rawdata) 
df.set_index('id') 
ref_dict= {'entity':['entity_exp1'], 
      'entity2':['entity2_exp1','entity2_exp2'], 
      'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']} 

# here comes the new part: 
df2 = pd.DataFrame() 
for key, val in sorted(ref_dict.items()): 
    for subval in val: 
     df2[subval] = df[key] 

df2['id'] = df['id'] 
df2.set_index('id', inplace=True) 

print(df2) 
     entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
id                  
json  present  present  present  absent  absent  absent 
molly  absent  present  present  absent  absent  absent 
tina  absent  present  present  absent  absent  absent 
jake  present  absent  absent  present  present  present  
molly  present  absent  absent  absent  absent  absent 
0

您可以使用dict鍵列名重新索引你的DF,然後重命名使用dict的值的列。

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[])) 
df_new.columns=sum(ref_dict.values(),[]) 
df_new 
Out[573]: 
    entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 
0  present  present  present  absent  absent  absent 
1  absent  present  present  absent  absent  absent 
2  absent  present  present  absent  absent  absent 
3  present  absent  absent  present  present  present 
4  present  absent  absent  absent  absent  absent