熊貓爲單個追加多列

如何使用熊貓爲每個單個客戶高效追加多個KPI值？熊貓爲單個追加多列

將df與 df和customers df結合會產生一些問題，因爲該國是數據框架的索引並且國籍不在索引中。

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'], 
          'indicator':['z','x','z','x'], 
          'value':[7,8,9,7]}) 
customers = pd.DataFrame({'customer':['first','second'], 
          'nationality':['Germany','Austria'], 
          'value':[7,8]})

見粉色期望的結果：

來源

2016-09-22 Georg Heiler

您可以通過merge計數器類別的不匹配：

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator']) 
df.index.name = 'nationality'  
customers.merge(df['value'].reset_index(), on='nationality', how='outer')

數據：

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'], 
          'indicator':['z','x','z','x'], 
          'value':[7,8,9,7]}) 
customers = pd.DataFrame({'customer':['first','second'], 
          'nationality':['Slovakia','Austria'], 
          'value':[7,8]})

這個問題似乎是因爲pivot操作導致您的DF中有CategoricalIndex，並且當您執行reset_index時，您會抱怨那個錯誤。

簡單地做逆向工程在檢查countryKPI的dtypes和customers Dataframes何有category提到，通過astype(str)

轉換這些列其string表示再現錯誤和打擊它：

假設DF爲上述提及的：

countryKPI['indicator'] = countryKPI['indicator'].astype('category') 
countryKPI['country'] = countryKPI['country'].astype('category') 
customers['nationality'] = customers['nationality'].astype('category') 

countryKPI.dtypes 
country  category 
indicator category 
value   int64 
dtype: object 

customers.dtypes 
customer   object 
nationality category 
value    int64 
dtype: object

pivot操作後：

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator']) 
df.index 
CategoricalIndex(['Austria', 'Germany'], categories=['Austria', 'Germany'], ordered=False, 
        name='country', dtype='category') 
# ^^ See the categorical index

當您執行對reset_index：

df.reset_index()

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

爲了解決這個錯誤，簡單地把分類列str類型。

countryKPI['indicator'] = countryKPI['indicator'].astype('str') 
countryKPI['country'] = countryKPI['country'].astype('str') 
customers['nationality'] = customers['nationality'].astype('str')

現在，reset_index部分作品甚至merge了。

來源

2016-09-22 09:43:56

有趣而簡單。但是http://imgur.com/a/PeCyh爲什麼我會爲初始數據集（0,1,2,3）獲得其他幾個值？ –

我看到了 - 您的最新修改會使我的最新評論無效。 –

但是，仍然存在以下問題：不能將項目插入到分類索引中，但我不是已有的分類 –

我認爲你可以使用concat：

df_pivoted = countryKPI.pivot_table(index='country', 
           columns='indicator', 
           values='value', 
           fill_value=0) 
print (df_pivoted)  
indicator x z 
country   
Austria 7 7 
Germany 8 9 

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)) 
     customer value x z 
Austria second  8 7 7 
Germany first  7 8 9      


print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1) 
     .reset_index() 
     .rename(columns={'index':'nationality'}) 
     [['customer','nationality','value','x','z']]) 

    customer nationality value x z 
0 second  Austria  8 7 7 
1 first  Germany  7 8 9

編輯的評論：

問題是列customers.nationality和countryKPI.country的dtypes是category，如果有些類別是想念克，它引發錯誤：

ValueError: incompatible categories in categorical concat

解決方案通過union找到共同的類別，然後set_categories：

import pandas as pd 
import numpy as np 

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'], 
          'indicator':['z','x','z','x'], 
          'value':[7,8,9,7]}) 
customers = pd.DataFrame({'customer':['first','second'], 
          'nationality':['Slovakia','Austria'], 
          'value':[7,8]}) 

customers.nationality = customers.nationality.astype('category') 
countryKPI.country = countryKPI.country.astype('category') 

print (countryKPI.country.cat.categories) 
Index(['Austria', 'Germany'], dtype='object') 

print (customers.nationality.cat.categories) 
Index(['Austria', 'Slovakia'], dtype='object') 

all_categories =countryKPI.country.cat.categories.union(customers.nationality.cat.categories) 
print (all_categories) 
Index(['Austria', 'Germany', 'Slovakia'], dtype='object') 

customers.nationality = customers.nationality.cat.set_categories(all_categories) 
countryKPI.country = countryKPI.country.cat.set_categories(all_categories)

df_pivoted = countryKPI.pivot_table(index='country', 
           columns='indicator', 
           values='value', 
           fill_value=0) 
print (df_pivoted)  
indicator x z 
country   
Austria 7 7 
Germany 8 9 
Slovakia 0 0   

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1) 
     .reset_index() 
     .rename(columns={'index':'nationality'}) 
     [['customer','nationality','value','x','z']]) 

    customer nationality value x z 
0 second  Austria 8.0 7 7 
1  NaN  Germany NaN 8 9 
2 first Slovakia 7.0 0 0

如果需要更好的性能，而不是pivot_table使用groupby：

df_pivoted1 = countryKPI.groupby(['country','indicator']) 
         .mean() 
         .squeeze() 
         .unstack() 
         .fillna(0) 
print (df_pivoted1) 
indicator x z 
country    
Austria 7.0 7.0 
Germany 8.0 9.0 
Slovakia 0.0 0.0

時序：

In [177]: %timeit countryKPI.pivot_table(index='country', columns='indicator', values='value', fill_value=0) 
100 loops, best of 3: 6.24 ms per loop 

In [178]: %timeit countryKPI.groupby(['country','indicator']).mean().squeeze().unstack().fillna(0) 
100 loops, best of 3: 4.28 ms per loop

來源

2016-09-22 08:50:59 jezrael

這幾乎可行 - 但我得到類別連續不兼容的類別的錯誤 –

問題是與真實的數據，對不對？我想，Smale完美地工作。 – jezrael

不幸的是。 –

熊貓爲單個追加多列

回答

相關問題