首先將set
秒轉換爲str
並且通過strip
刪除{}
。
Then str.get_dummies
。
最後add_prefix
:
df = pd.DataFrame({'Name':['John','Mary','Dan','Peter','Ed'],
'cards':[set(['A','B']), set(['B','C','A']),
set(['D','A']), set(['C','A']), set(['A','C','D'])]})
print (df)
Name cards
0 John {A, B}
1 Mary {A, C, B}
2 Dan {A, D}
3 Peter {A, C}
4 Ed {A, D, C}
df.cards = df.cards.astype(str).str.strip('{}')
df = df.set_index('Name').cards.str.get_dummies(', ')
df.columns = df.columns.str.strip("'")
df = df.add_prefix('Card_').reset_index()
print (df)
Name Card_A Card_B Card_C Card_D
0 John 1 1 0 0
1 Mary 1 1 1 0
2 Dan 1 0 0 1
3 Peter 1 0 1 0
4 Ed 1 0 1 1
另一種替代的解決方案:
def f(category_list):
n_categories = len(category_list)
return pd.Series(dict(zip(category_list, [1]*n_categories)))
df1 = df.set_index('Name').cards
.apply(f)
.add_prefix('Card_')
.fillna(0)
.astype(int)
.reset_index()
print (df1)
Name Card_A Card_B Card_C Card_D
0 John 1 1 0 0
1 Mary 1 1 1 0
2 Dan 1 0 0 1
3 Peter 1 0 1 0
4 Ed 1 0 1 1