我有以下熊貓數據幀:如何統計熊貓數據框中分類數據的子組?
import pandas as pd
import numpy as np
df = pd.DataFrame({"shops": ["shop1", "shop2", "shop3", "shop4", "shop5", "shop6"], "franchise" : ["franchise_A", "franchise_A", "franchise_A", "franchise_A", "franchise_B", "franchise_B"],"items" : ["dog", "cat", "dog", "dog", "bird", "fish"]})
df = df[["shops", "franchise", "items"]]
print(df)
shops franchise items
0 shop1 franchise_A dog
1 shop2 franchise_A cat
2 shop3 franchise_A dog
3 shop4 franchise_A dog
4 shop5 franchise_B bird
5 shop6 franchise_B fish
所以,每行是一個獨特樣品shop1
,shop2
等由此每個樣本屬於子組franchise_A
,franchise_B
,franchise_C
等 在items
柱,只有四種可能的分類值:dog
,cat
,fish
,bird
。我的動機是爲每個「特許經營」創建dog
,cat
,fish
,bird
的數量的條形圖。
我想輸出是
franchise dogs cats birds fish
franchise_A 3 1 0 0
franchise_B 0 0 1 1
我相信,我首先要使用groupby()
,例如
df.groupby("franchise").count()
shops items
franchise
franchise_A 4 4
franchise_B 2 2
但我不知道如何計算每個特許經營項目的數量。
'value_counts的()''而不是將Counter'真的緊了整個事情了。 –
@NickilMaveli - 謝謝。 – jezrael
這是一個單獨的問題:假設有5個類別,其中一個是'NaN'。我如何將NaN值作爲一個單獨的類別? 'df.groupby(「franchise」)['items']。value_counts()。unstack(fill_value = 0)'不會這樣做。 – ShanZhengYang