2017-07-25 47 views
1

我有記錄的CSV:如何創建熊貓分類索引記錄列表?

name,credits,email 
bob,,[email protected] 
bob,6.0,[email protected] 
bill,3.0,[email protected] 
bill,4.0,[email protected] 
tammy,5.0,[email protected] 

其中name是該指數。因爲有相同名稱的多個記錄,我想整個行(減去名稱)捲成列表創建窗體的JSON:

{ 
    "bob": [ 
     { "credits": null, "email": "[email protected]"}, 
     { "credits": 6.0, "email": "[email protected]" } 
    ], 
    // ... 
} 

我目前的解決方案是有點kludgey因爲它似乎用大熊貓僅作爲閱讀CSV的工具,但仍然是產生預期的我輸出JSONish:

#!/usr/bin/env python3 

import io 
import pandas as pd 
from pprint import pprint 
from collections import defaultdict 

def read_data(): 
    s = """name,credits,email 
bob,,[email protected] 
bob,6.0,[email protected] 
bill,3.0,[email protected] 
bill,4.0,[email protected] 
tammy,5.0,[email protected] 
""" 

    data = io.StringIO(s) 
    return pd.read_csv(data) 

if __name__ == "__main__": 
    df = read_data() 
    columns = df.columns 
    index_name = "name" 
    print(df.head()) 

    records = defaultdict(list) 

    name_index = list(columns.values).index(index_name) 
    columns_without_index = [column for i, column in enumerate(columns) if i != name_index] 

    for record in df.values: 
     name = record[name_index] 
     record_without_index = [field for i, field in enumerate(record) if i != name_index] 
     remaining_record = {k: v for k, v in zip(columns_without_index, record_without_index)} 
     records[name].append(remaining_record) 
    pprint(dict(records)) 

有沒有辦法做到在本地大熊貓(和numpy的)是一回事嗎?

回答

4

這就是你想要的嗎?

cols = df.columns.drop('name').tolist() 

或依@jezrael:

cols = df.columns.difference(['name']) 

然後:

s = df.groupby('name')[cols].apply(lambda x: x.to_dict('r')).to_json() 

讓打印好聽:

In [45]: print(json.dumps(json.loads(s), indent=2)) 
{ 
    "bill": [ 
    { 
     "credits": 3.0, 
     "email": "[email protected]" 
    }, 
    { 
     "credits": 4.0, 
     "email": "[email protected]" 
    } 
    ], 
    "bob": [ 
    { 
     "credits": null, 
     "email": "[email protected]" 
    }, 
    { 
     "credits": 6.0, 
     "email": "[email protected]" 
    } 
    ], 
    "tammy": [ 
    { 
     "credits": 5.0, 
     "email": "[email protected]" 
    } 
    ] 
} 
+0

差不多!如果我不需要明確列出「groupby」後面的列,那很好,但我認爲這很簡單。 – erip

+0

@erip,我已更新我的文章 - 請檢查... – MaxU

+0

完美!非常感謝你的幫助! – erip