我認爲你可以使用concat
:
df_pivoted = countryKPI.pivot_table(index='country',
columns='indicator',
values='value',
fill_value=0)
print (df_pivoted)
indicator x z
country
Austria 7 7
Germany 8 9
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1))
customer value x z
Austria second 8 7 7
Germany first 7 8 9
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
.reset_index()
.rename(columns={'index':'nationality'})
[['customer','nationality','value','x','z']])
customer nationality value x z
0 second Austria 8 7 7
1 first Germany 7 8 9
編輯的評論:
問題是列customers.nationality
和countryKPI.country
的dtypes
是category
,如果有些類別是想念克,它引發錯誤:
ValueError: incompatible categories in categorical concat
解決方案通過union
找到共同的類別,然後set_categories:
import pandas as pd
import numpy as np
countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
'indicator':['z','x','z','x'],
'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
'nationality':['Slovakia','Austria'],
'value':[7,8]})
customers.nationality = customers.nationality.astype('category')
countryKPI.country = countryKPI.country.astype('category')
print (countryKPI.country.cat.categories)
Index(['Austria', 'Germany'], dtype='object')
print (customers.nationality.cat.categories)
Index(['Austria', 'Slovakia'], dtype='object')
all_categories =countryKPI.country.cat.categories.union(customers.nationality.cat.categories)
print (all_categories)
Index(['Austria', 'Germany', 'Slovakia'], dtype='object')
customers.nationality = customers.nationality.cat.set_categories(all_categories)
countryKPI.country = countryKPI.country.cat.set_categories(all_categories)
df_pivoted = countryKPI.pivot_table(index='country',
columns='indicator',
values='value',
fill_value=0)
print (df_pivoted)
indicator x z
country
Austria 7 7
Germany 8 9
Slovakia 0 0
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
.reset_index()
.rename(columns={'index':'nationality'})
[['customer','nationality','value','x','z']])
customer nationality value x z
0 second Austria 8.0 7 7
1 NaN Germany NaN 8 9
2 first Slovakia 7.0 0 0
如果需要更好的性能,而不是pivot_table
使用groupby
:
df_pivoted1 = countryKPI.groupby(['country','indicator'])
.mean()
.squeeze()
.unstack()
.fillna(0)
print (df_pivoted1)
indicator x z
country
Austria 7.0 7.0
Germany 8.0 9.0
Slovakia 0.0 0.0
時序:
In [177]: %timeit countryKPI.pivot_table(index='country', columns='indicator', values='value', fill_value=0)
100 loops, best of 3: 6.24 ms per loop
In [178]: %timeit countryKPI.groupby(['country','indicator']).mean().squeeze().unstack().fillna(0)
100 loops, best of 3: 4.28 ms per loop
有趣而簡單。但是http://imgur.com/a/PeCyh爲什麼我會爲初始數據集(0,1,2,3)獲得其他幾個值? –
我看到了 - 您的最新修改會使我的最新評論無效。 –
但是,仍然存在以下問題:不能將項目插入到分類索引中,但我不是已有的分類 –