pandas groupby嵌套json

我經常使用pandas groupby生成堆疊表格。但是，我經常想要將結果嵌套關係輸出爲json。有沒有什麼辦法從它生成的堆棧表中提取嵌套的json文件？pandas groupby嵌套json

比方說，我有一個像DF：

year office candidate amount 
2010 mayor joe smith 100.00 
2010 mayor jay gould 12.00 
2010 govnr pati mara 500.00 
2010 govnr jess rapp 50.00 
2010 govnr jess rapp 30.00

我可以這樣做：

grouped = df.groupby('year', 'office', 'candidate').sum() 

print grouped 
         amount 
year office candidate 
2010 mayor joe smith 100 
      jay gould 12 
    govnr pati mara 500 
      jess rapp 80

美麗！當然，我真正想要做的是通過一個命令沿着grouped.to_json的行獲取嵌套的json。但該功能不可用。任何解決方法？

所以，我真正想要的是一樣的東西：

{"2010": {"mayor": [ 
        {"joe smith": 100}, 
        {"jay gould": 12} 
        ] 
     }, 
      {"govnr": [ 
        {"pati mara":500}, 
        {"jess rapp": 80} 
        ] 
      } 
}

唐

來源

2014-06-23 Don

上面的代碼實際上並不工作，因爲金額列（例如'$ 30'）是字符串，因此被添加爲字符串而不是數字。另外，你還不清楚你想要什麼json輸出，爲什麼to_json爲你工作？ –

@AndyHayden好點。我已編輯修復/澄清。 – Don

@Don有沒有解決方法？ – skycrew

我不認爲認爲有內置到大熊貓來創建數據的嵌套字典什麼。下面是一些代碼，一般應工作用於與多指標一個系列，通過MultIndex的每一級使用defaultdict

嵌套代碼循環，增加層到字典，直到最深層被分配給系列值。

In [99]: from collections import defaultdict 

In [100]: results = defaultdict(lambda: defaultdict(dict)) 

In [101]: for index, value in grouped.itertuples(): 
    ...:  for i, key in enumerate(index): 
    ...:   if i == 0: 
    ...:    nested = results[key] 
    ...:   elif i == len(index) - 1: 
    ...:    nested[key] = value 
    ...:   else: 
    ...:    nested = nested[key] 

In [102]: results 
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})}) 

In [106]: print json.dumps(results, indent=4) 
{ 
    "2010": { 
     "govnr": { 
      "pati mara": 500.0, 
      "jess rapp": 80.0 
     }, 
     "mayor": { 
      "joe smith": 100.0, 
      "jay gould": 12.0 
     } 
    } 
}

來源

2014-06-24 00:38:42 chrisb

很好 - 謝謝！ – Don

@chrisb我想在這裏適應你的答案類似的問題，但我被絆住了grouped.intertuples（）：http://stackoverflow.com/questions/37819622/valueerror-too-many-values-to-解壓-當-使用-itertuples-ON-大熊貓-datafram/37819973＃37819973 – spaine

我知道這是一個老問題，但最近我遇到了同樣的問題。這是我的解決方案。我從chrisb的例子中借了很多東西（謝謝！）。

這樣做的好處是您可以傳遞一個lambda來獲取您想要的任何可枚舉項以及每個組的最終值。

from collections import defaultdict 

def dict_from_enumerable(enumerable, final_value, *groups): 
    d = defaultdict(lambda: defaultdict(dict)) 
    group_count = len(groups) 
    for item in enumerable: 
     nested = d 
     item_result = final_value(item) if callable(final_value) else item.get(final_value) 
     for i, group in enumerate(groups, start=1): 
      group_val = str(group(item) if callable(group) else item.get(group)) 
      if i == group_count: 
       nested[group_val] = item_result 
      else: 
       nested = nested[group_val] 
    return d

在的問題，你會打電話像這樣的功能：

dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')

的第一個參數可以是一組數據爲好，甚至不需要大熊貓。

來源

2017-07-26 15:28:42

pandas groupby嵌套json

回答

相關問題