2014-01-22 72 views
2

我需要將以下格式的平面文件轉換爲JSON格式。輸入和輸出如下所示。我碰到過這個:Create nested JSON from CSV但是,我有一個額外的信息/字段level,用於確定JSON輸出中的嵌套結構。 Python pandas的確有df.to_json,但是找不到想要的輸出格式。任何幫助將不勝感激。將平面製表符分隔的文件轉換爲Json嵌套結構

輸入:

name level children size 
aaa 7 aaab 2952 
aaa 7 aaac 251 
aaa 7 aaad 222 
aaab 8 xxx 45 
aaab 8 xxy 29 
aaab 8 xxz 28 
aaab 8 xxa 4 
aaac 8 ddd 7 
aaac 8 xxt 4 
aaac 8 xxu 1 
aaac 8 xxv 1 
ddd 9 ppp 4 
ddd 9 qqq 2 

輸出:

{ 
"name": "aaa", 
"size": 5000, 
"children": 
    [ 
     { 
     "name": "aaab", 
     "size": 2952, 
     "children": [ 
        {"name": "xxx", "size": 45}, 
        {"name": "xxy", "size": 29}, 
        {"name": "xxz", "size": 28}, 
        {"name": "xxa", "size": 4} 
        ] 
     }, 

     { 
     "name": "aaac", 
     "size": 251, 
     "children": [ 
         { 
         "name": "ddd", 
         "size": 7, 
         "children": [ 
            {"name": "ppp", "size": 4}, 
            {"name": "qqq", "size": 2} 
            ] 
         }, 
         {"name": "xxt", "size": 4}, 
         {"name": "xxu", "size": 1}, 
         {"name": "xxv", "size": 1} 
        ] 
     }, 
     {"name": "aaad","size": 222} 
    ] 
} 
+0

你如何確定 「AAA」 的大小爲5000? – Kevin

回答

4

這是相當簡單的使用兩遍方法做:首先,構造一個節點爲每個單獨的線。然後,將每個節點連接到其子節點。

with open("data.txt") as file: 
    lines = file.read().split("\n") 

#remove header line. 
lines = lines[1:] 

entries = {} 

#create an entry for each child node. 
for line in lines: 
    name, level, child, size = line.split() 
    entries[child] = {"name": child, "size": int(size), "children": []} 

#we now have an entry for all nodes that are a child of another node. 
#but not for the topmost parent node, so we'll make one for it now. 
parents = set(line.split()[0] for line in lines) 
children = set(line.split()[2] for line in lines) 
top_parent = (parents - children).pop() 
#(just guess the size, since it isn't supplied in the file) 
entries[top_parent] = {"name": top_parent, "size": 5000, "children": []} 

#hook up each entry to its children 
for line in lines: 
    name, level, child, size = line.split() 
    entries[name]["children"].append(entries[child]) 

#the nested structure is ready to use! 
structure = entries[top_parent] 

#display the beautiful result 
import pprint 
pprint.pprint(structure) 

結果:

{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45}, 
          {'children': [], 'name': 'xxy', 'size': 29}, 
          {'children': [], 'name': 'xxz', 'size': 28}, 
          {'children': [], 'name': 'xxa', 'size': 4}], 
       'name': 'aaab', 
       'size': 2952}, 
       {'children': [{'children': [{'children': [], 
              'name': 'ppp', 
              'size': 4}, 
              {'children': [], 
              'name': 'qqq', 
              'size': 2}], 
          'name': 'ddd', 
          'size': 7}, 
          {'children': [], 'name': 'xxt', 'size': 4}, 
          {'children': [], 'name': 'xxu', 'size': 1}, 
          {'children': [], 'name': 'xxv', 'size': 1}], 
       'name': 'aaac', 
       'size': 251}, 
       {'children': [], 'name': 'aaad', 'size': 222}], 
'name': 'aaa', 
'size': 5000} 

編輯:您可以通過使用del語句刪除葉節點的children屬性。

#execute this after the "hook up each entry to its children" section. 
#remove "children" from leaf nodes. 
for entry in entries.itervalues(): 
    if not entry["children"]: 
     del entry["children"] 

結果:

{'children': [{'children': [{'name': 'xxx', 'size': 45}, 
          {'name': 'xxy', 'size': 29}, 
          {'name': 'xxz', 'size': 28}, 
          {'name': 'xxa', 'size': 4}], 
       'name': 'aaab', 
       'size': 2952}, 
       {'children': [{'children': [{'name': 'ppp', 'size': 4}, 
              {'name': 'qqq', 'size': 2}], 
          'name': 'ddd', 
          'size': 7}, 
          {'name': 'xxt', 'size': 4}, 
          {'name': 'xxu', 'size': 1}, 
          {'name': 'xxv', 'size': 1}], 
       'name': 'aaac', 
       'size': 251}, 
       {'name': 'aaad', 'size': 222}], 
'name': 'aaa', 
'size': 5000} 
+0

謝謝@凱文。輸出不是所需的格式。這裏有兩件事情是不可取的:1.帶有'children'的項目:[]'不需要在輸出中,並且2.訂購已更改 – user1140126

+0

您是否指定屬性name/size/children的排序? Python中的字典本質上是無序的,所以這是我的解釋器的一個怪癖,它是這樣打印的。如果我嘗試過,我無法改變順序。 – Kevin

相關問題