2017-04-24 178 views
2

我想重塑重複行的數據幀。數據來自重複數據塊的csv文件。大熊貓重塑行重複

舉個例子:

Name  1st 2nd 
0 Value1  a1  b1 
1 Value2  a2  b2 
2 Value3  a3  b3 
3 Value1  a4  b4 
4 Value2  a5  b5 
5 Value3  a6  b6 

應被重塑成:

Name  1st 2nd 3rd 4th 
Value1 a1 b1 a4 b4 
Value2 a2 b2 a5 b5 
Value3 a3 b3 a6 b6 

你有什麼建議,如何做到這一點? 我已經看過這個thread,但是我看不到如何將這種方法轉化爲我的問題,其中groupby工作的列有多個列右側。

回答

3

您可以使用set_indexstack你的兩列合併成一個,cumcount得到新的列標籤,並pivot做整形:

# Stack the 1st and 2nd columns, and use cumcount to get the new column labels. 
df = df.set_index('Name').stack().reset_index(level=1, drop=True).to_frame() 
df['new_col'] = df.groupby(level='Name').cumcount() 

# Perform a pivot to get the desired shape. 
df = df.pivot(columns='new_col', values=0) 

# Formatting. 
df = df.reset_index().rename_axis(None, 1) 

輸出結果:

 Name 0 1 2 3 
0 Value1 a1 b1 a4 b4 
1 Value2 a2 b2 a5 b5 
2 Value3 a3 b3 a6 b6 
1

按名稱分組後,重複創建一個帶有df值的數據幀,並將該df與原始文件合併。

df1 = df.groupby('Name')['1st', '2nd'].apply(lambda x: x.iloc[1]).reset_index() 
df1.columns = ['Name', '3rd', '4th'] 
df = df.drop_duplicates(subset=['Name']).merge(df1, on = 'Name') 

你得到

Name 1st 2nd 3rd 4th 
0 Value1 a1 b1 a4 b4 
1 Value2 a2 b2 a5 b5 
2 Value3 a3 b3 a6 b6