它與workclass
列的字符串值創建:
data = pd.DataFrame({'age':[1,1,1,2,1,1],
'workclass':['Government Employee','Private Employee','Self Employed','Self Employed Incorpora ted','Self Employed Incorpora ted','?']})
print (data)
age workclass
0 1 Government Employee
1 1 Private Employee
2 1 Self Employed
3 2 Self Employed Incorpora ted
4 1 Self Employed Incorpora ted
5 1 ?
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 1 0 0
3 2 0 0
4 1 0 0
5 1 1 0
workclass_Private Employee workclass_Self Employed \
0 0 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 0
workclass_Self Employed Incorpora ted
0 0
1 0
2 0
3 1
4 1
5 0
如果有相同的價值觀多列這個前綴是真正的幫助:
data = pd.DataFrame({'age':[1,1,3],
'workclass':['Government Employee','Private Employee','?'],
'workclass1':['Government Employee','Private Employee','Self Employed']})
print (data)
age workclass workclass1
0 1 Government Employee Government Employee
1 1 Private Employee Private Employee
2 3 ? Self Employed
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 3 1 0
workclass_Private Employee workclass1_Government Employee \
0 0 1
1 1 0
2 0 0
workclass1_Private Employee workclass1_Self Employed
0 0 0
1 1 0
2 0 1
如果不要需要它,可以添加參數以覆蓋空白空間:
data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
age ? Government Employee Private Employee Government Employee \
0 1 0 1 0 1
1 1 0 0 1 0
2 3 1 0 0 0
Private Employee Self Employed
0 0 0
1 1 0
2 0 1
然後可以通過groupby
列和彙總max
每唯一列假人:
print (data_dummies.groupby(level=0, axis=1).max())
? Government Employee Private Employee Self Employed age
0 0 1 0 0 1
1 0 0 1 0 1
2 1 0 0 1 3
其實在這裏我們不遵守workclass_?但作者提到。這是什麼專欄 – venkysmarty
我在最後一次編輯中添加它,現在檢查它。 – jezrael
@謝謝了 – venkysmarty