1
我有這樣一個數據幀創建一個列分組的排列由另一列熊貓
df_want = pd.DataFrame([['jon snow', 'jon-snow', 'jon-snow'],
['jon snow', 'jon-snow', 'jon+snow'],
['jon snow', 'jon-snow', 'jonsnow'],
['jon snow', 'jon-snow', np.nan],
['jon snow', 'jon+snow', 'jon-snow'],
['jon snow', 'jon+snow', 'jon+snow'],
['jon snow', 'jon+snow', 'jonsnow'],
['jon snow', 'jon+snow', np.nan],
['jon snow', 'jonsnow', 'jon-snow'],
['jon snow', 'jonsnow', 'jon+snow'],
['jon snow', 'jonsnow', 'jon-snow'],
['jon snow', 'jonsnow', np.nan],
['jon snow', np.nan, 'jon-snow'],
['jon snow', np.nan, 'jon+snow'],
['jon snow', np.nan, 'jonsnow'],
['jon snow', np.nan, np.nan]], columns=['name', 'name_variation', 'name_variation_2'])
我是想這裏面工作,但感覺長篇大論:
def combinations(df):
df = df.drop_duplicates()
df = df.dropna()
df['k'] = df['brand_variation']
df['val'] = 1
df_final = pd.DataFrame(columns=['brand', 'k', 'brand_variation',])
for res in df['brand'].unique():
#print(res, len(df[df['brand'] == res]))
dfm = df[df['brand'] == res]
dfk = pd.pivot_table(dfm, index=['brand', 'k'], columns=['brand_variation'], values=['val'], fill_value=0, aggfunc=[np.sum]).stack().reset_index()
dfk.columns = dfk.columns.get_level_values(level=0)
dfk = dfk[['brand', 'k', 'brand_variation']]
df_final = df_final.append(dfk)
df_final = df_final.reset_index(drop=True)
return df_final
更好的方法來做到這一點?