熊貓合併保留列名的重複DataFrame列

如何合併重複的DataFrame列並保留所有原始列名稱？熊貓合併保留列名的重複DataFrame列

例如如果我有數據幀

df = pd.DataFrame({"col1" : [0, 0, 1, 2, 5, 3, 7], 
        "col2" : [0, 1, 2, 3, 3, 3, 4], 
        "col3" : [0, 1, 2, 3, 3, 3, 4]})

我可以刪除重複的列（是的轉慢大型DataFrames）與

df.T.drop_duplicates().T

但這僅保留每個唯一列一個名

如何保存關於哪些列合併的信息？例如像

[col1] [col2, col3] 
0  0   0 
1  0   1 
2  1   2 
3  2   3 
4  5   3 
5  3   3 
6  7   4

謝謝！

來源

2016-12-25 Matt

公平的警告：你可能不希望存儲列標題像期望的結果。標題並不是列表。如果你有12個重複列？ – Parfait

# group columns by their values 
grouped_columns = df.groupby(list(df.values), axis=1).apply(lambda g: g.columns.tolist()) 

# pick one column from each group of the columns 
unique_df = df.loc[:, grouped_columns.str[0]] 

# make a new column name for each group, don't think the list can work as a column name, you need to join them 
unique_df.columns = grouped_columns.apply("-".join) 

unique_df

來源

2016-12-25 20:29:04 Psidom

我也用T和tuple到groupby

def f(x): 
    d = x.iloc[[0]] 
    d.index = ['-'.join(x.index.tolist())] 
    return d 

df.T.groupby(df.apply(tuple), group_keys=False).apply(f).T

來源

2016-12-26 03:44:30 piRSquared

熊貓合併保留列名的重複DataFrame列

回答

相關問題