pd.Series.str.get_dummies
和pd.DataFrame.dot
c = df['Country Affected'].str.get_dummies(sep=',')
s = df['Sector Affected'].str.get_dummies(sep=',')
c.T.dot(s)
Engineering Government Media
CN 1 1 1
FR 0 1 0
RU 0 1 0
TW 1 0 1
US 1 0 1
更大示例
np.random.seed([3,1415])
countries = ['CN', 'FR', 'RU', 'TW', 'US', 'UK', 'JP', 'AU', 'HK']
sectors = ['Engineering', 'Government', 'Media', 'Commodidty']
def pick_rnd(x):
i = np.random.randint(1, len(x))
j = np.random.choice(x, i, False)
return ','.join(j)
df = pd.DataFrame({
'Country Affected': [pick_rnd(countries) for _ in range(10)],
'Sector Affected': [pick_rnd(sectors) for _ in range(10)]
})
df
Country Affected Sector Affected
0 CN Government,Media
1 FR,TW,JP,US,UK,CN,RU,AU Commodidty,Government
2 HK,AU,JP Commodidty
3 RU,CN,FR,JP,UK Media,Commodidty,Engineering
4 CN,RU,FR,JP,TW,HK,US,UK Government,Media,Commodidty
5 FR,CN Commodidty
6 FR,HK,JP,TW,US,AU,CN Commodidty
7 CN,HK,RU,TW,UK,US,FR,JP Media,Commodidty
8 JP,UK,AU Engineering,Media
9 RU,UK,FR Media
然後
c = df['Country Affected'].str.get_dummies(sep=',')
s = df['Sector Affected'].str.get_dummies(sep=',')
c.T.dot(s)
Commodidty Engineering Government Media
AU 3 1 1 1
CN 6 1 3 4
FR 6 1 2 4
HK 4 0 1 2
JP 6 2 2 4
RU 4 1 2 4
TW 4 0 2 2
UK 4 2 2 5
US 4 0 2 2
你介意編輯你的帖子以顯示你到目前爲止的代碼嗎? –
有沒有可能改變您的數據庫? (美國,工程),(TW,工程),(CN,工程)等應該都是分開的行 –