2017-06-29 138 views
2

我有一個數據幀如下:多指標大熊貓數據幀的字典

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} 

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore']) 

如果我用兩列GROUPBY和計數的大小,

df.groupby(['regiment','company']).size() 

我得到如下:

regiment company 
Dragoons 1st  2 
      2nd  2 
Nighthawks 1st  2 
      2nd  2 
Scouts  1st  2 
      2nd  2 
dtype: int64 

我想作爲一個輸出是一個字典如下:

{'Dragoons':{'1st':2,'2nd':2}, 
'Nighthawks': {'1st':2,'2nd':2}, 
    ... } 

我試過不同的方法,但無濟於事。有沒有比較乾淨的方法來達到上述目的?

非常感謝你提前!!!!

回答

2

您可以DataFrame.to_dict添加Series.unstack

d = df.groupby(['regiment','company']).size().unstack().to_dict(orient='index') 
print (d) 
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2}, 
'Scouts': {'2nd': 2, '1st': 2}} 

另一種解決方案,作爲另一個答案很相似:

from collections import Counter 

df = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')} 
print (df) 
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2}, 
'Scouts': {'2nd': 2, '1st': 2}} 

但如果使用第一個解決方案,有房協與NaN問題s(取決於數據)

樣品:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '3rd'], 
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} 

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore']) 
print (df) 
     regiment company  name preTestScore postTestScore 
0 Nighthawks  1st Miller    4    25 
1 Nighthawks  1st Jacobson   24    94 
2 Nighthawks  2nd  Ali   31    57 
3 Nighthawks  2nd Milner    2    62 
4  Dragoons  1st  Cooze    3    70 
5  Dragoons  1st  Jacon    4    25 
6  Dragoons  2nd Ryaner   24    94 
7  Dragoons  2nd  Sone   31    57 
8  Scouts  1st  Sloan    2    62 
9  Scouts  1st  Piger    3    70 
10  Scouts  2nd  Riani    2    62 
11  Scouts  3rd  Ali    3    70 

df1 = df.groupby(['regiment','company']).size().unstack() 
print (df1) 
company  1st 2nd 3rd 
regiment     
Dragoons 2.0 2.0 NaN 
Nighthawks 2.0 2.0 NaN 
Scouts  2.0 1.0 1.0 

d = df1.to_dict(orient='index') 
print (d) 
{'Dragoons': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Nighthawks': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Scouts': {'3rd': 1.0, '2nd': 1.0, '1st': 2.0}} 

然後需要使用:

d = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')} 
print (d) 
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2}, 
'Scouts': {'3rd': 1, '2nd': 1, '1st': 2}} 

或者其他John Galt答案。

+0

非常感謝。很高興知道如何使用unpack。 – user4279562

+0

我在我的第一個答案中發現問題 - 僅適用於所有類別(如您的樣本數據)。所以更一般的是第二個答案或另一個解決方案... – jezrael

+1

我明白了。我最終選擇了第二種解決方案,因爲它不能用nans生成密鑰。 – user4279562

1

如何創建與團體理解的字典。

In [409]: {g:v['company'].value_counts().to_dict() for g, v in df.groupby('regiment')} 
Out[409]: 
{'Dragoons': {'1st': 2, '2nd': 2}, 
'Nighthawks': {'1st': 2, '2nd': 2}, 
'Scouts': {'1st': 2, '2nd': 2}} 
3

您可以按照您的需要分組並重新設置索引。下面的代碼給出了所需的輸出。

df = df.groupby(['regiment','company']).size().reset_index() 
print(pd.pivot_table(df, values=0, index='regiment', columns='company').to_dict(orient='index')) 

輸出:

{'Nighthawks': {'2nd': 2, '1st': 2}, 'Scouts': {'2nd': 2, '1st': 2}, 'Dragoons': {'2nd': 2, '1st': 2}} 
+0

謝謝。這很棒! – user4279562