2017-02-22 65 views
0

我對我希望我的Python字典列表看起來像有什麼想法,但在將電子表格數據拉入數據結構時遇到問題。我遇到的問題是,一行可能有數據來填充父字典值以及一個孩子。對於後續行,如果父級列的值爲空,則假定子級的列屬於上一級父級。如果我們遇到父數據不爲空的新行,請將其視爲要添加到列表中的新父項。使用電子表格數據填充嵌套字典

這是電子表格會是什麼樣子的例子:

+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| name   | descr    | adminSt | authSt | server_hostname_ip | server_descr | server_preferred | server_EPG | server_minPol | server_maxPoll | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| test1-NTPPOL | Test NTP Policy | enabled | disabled | 10.10.10.10  | NTP1 server | yes    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 10.10.10.11  | NTP2 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 10.10.10.12  | NTP3 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
| test2-NTPPOL | Test 2 NTP policy | enabled | disabled | 20.10.10.10  | NTP1 server | yes    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 20.10.10.11  | NTP2 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 
|    |     |   |   | 20.10.10.12  | NTP3 server | no    | oob-default | 4    | 6    | 
+--------------+-------------------+---------+----------+--------------------+--------------+------------------+-------------+---------------+----------------+ 

我想數據結構是這樣的:我來這個樣子

[ 
    { 
    "name": "NTP_Policy1", 
    "descr": "NTP Policy 1", 
    "adminSt": "enabled", 
    "authSt": "disabled", 
    "servers": [ 
     { 
     "hostname": "10.10.10.10", 
     "descr": "NTP1 Server", 
     "preferred": true, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     }, 
     { 
     "hostname": "20.10.10.10", 
     "descr": "NTP2 Server", 
     "preferred": false, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     } 
    ] 
    }, 
    { 
    "name": "NTP_Policy2", 
    "descr": "NTP Policy 2", 
    "adminSt": "enabled", 
    "authSt": "disabled", 
    "servers": [ 
     { 
     "hostname": "30.10.10.10", 
     "descr": "NTP3 Server", 
     "preferred": true, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     }, 
     { 
     "hostname": "40.10.10.10", 
     "descr": "NTP4 Server", 
     "preferred": false, 
     "server_EPG": "oob-default", 
     "minPoll": 4, 
     "maxPoll": 6 
     } 
    ] 
    } 
] 

最接近的代碼,但是後續行將子級附加到父級別。

>>> import pyexcel 
>>> from pprint import pprint 
>>> def excel_to_dict(sheet): 
...  rows = sheet.iter_rows() 
...  keys = next(rows) 
...  dict_list = [] 
...  # For each row in the spreadsheet, 
...  # Create an iterator pair so that the key is iterated over at the same time as its matching cell in the row 
...  # Then save that pairing as descriptors of the switch 
...  for row in rows: 
...   dict = {} 
...   dict['servers'] = [] 
...   server_atts = {} 
...   for key,cell in zip(keys, row): 
...    if str(cell.value) != 'None' and str(key.value) == 'name': 
...     dict[str(key.value)] = str(cell.value) 
...     parentKey = str(key.value) 
...    elif (str(cell.value) != 'None' and str(key.value) == 'descr') or (str(cell.value) != 'None' and str(key.value) == 'adminSt') or (str(cell.value) != 'None' and str(key.value) == 'authSt'): 
...     dict[str(key.value)] = str(cell.value) 
...    elif str(cell.value) == 'None': 
...     continue 
...    else: 
...     server_atts[str(key.value)] = str(cell.value) 
...   dict['servers'].append(server_atts.copy()) 
...   dict_list.append(dict.copy()) 
...  return dict_list 
>>> wb = openpyxl.load_workbook('aci_config.xlsx') 
>>> ntpPolsSheet = wb.get_sheet_by_name('ntp_pol') 
>>> ntpPols = excel_to_dict(ntpPolsSheet) 
>>> 
>>> pprint(ntpPols) 
[{'adminSt': 'enabled', 
    'authSt': 'disabled', 
    'descr': 'Test NTP Policy', 
    'name': 'test1-NTPPOL', 
    'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP1 server', 
       'server_hostname_ip': '10.10.10.10', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'yes'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP2 server', 
       'server_hostname_ip': '10.10.10.11', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP3 server', 
       'server_hostname_ip': '10.10.10.12', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'adminSt': 'enabled', 
    'authSt': 'disabled', 
    'descr': 'Test 2 NTP policy', 
    'name': 'test2-NTPPOL', 
    'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP1 server', 
       'server_hostname_ip': '20.10.10.10', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'yes'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP2 server', 
       'server_hostname_ip': '20.10.10.11', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}, 
{'servers': [{'server_EPG': 'oob-default', 
       'server_descr': 'NTP3 server', 
       'server_hostname_ip': '20.10.10.12', 
       'server_maxPoll': '6', 
       'server_minPol': '4', 
       'server_preferred': 'no'}]}] 

什麼代碼需要看起來像正確填充字典清單?是否有更好的電子表格格式可以更容易地導入數據?我正嘗試在一張紙上完成所有操作,而不是多張紙。

+1

你可不可以爲此使用'pandas'嗎?它只需幾行代碼即可達到相同的結果。 –

+0

你應該把它轉換成'json' –

+0

你遇到的問題是什麼?數據是否按照您的預期進入? – aydow

回答

0

我建議將.xlsx文件保存爲csv格式,因爲它必須更容易處理。它會看起來像這樣的文字形式:

name,descr,adminSt,authSt,server_hostname_ip,server_descr,server_preferred,server_EPG,server_minPoll, 
test1-NTPPOL,Test NTP Policy,enabled,disabled,10.10.10.10,NTP1 server,yes,oob-default,4,6 
,,,,10.10.10.11,NTP2 server,no,oob-default,4,6 
,,,,10.10.10.12,NTP3 server,no,oob-default,4,6 
test2-NTPPOL,Test 2 NTP policy,enabled,disabled,20.10.10.10,NTP1 server,yes,oob-default,4,6 
,,,,20.10.10.11,NTP2 server,no,oob-default,4,6 
,,,,20.10.10.12,NTP3 server,no,oob-default,4,6 

然後,您可以使用熊貓閱讀csv並將其轉換爲json。熊貓有一個.iloc函數,它允許你先按行索引,再按列名索引。

import pandas as pd 
from beeprint import pp 

def excel_to_dict(sheet): 
    dict_list = [] 
    last_test_dict = None 
    for i in xrange(len(sheet)): 
     # When we find a new row with a name value, we want to insert 
     # the old test_dict into the dict_list and make a new test_dict. 
     # Also, we want to skip the first row to not append an empty dict. 
     if pd.notnull(sheet.iloc[i]['name']): 
      if i != 0: 
       dict_list.append(test_dict) 
      test_dict = {} 
      test_dict['name'] = sheet.iloc[i]['name'] 
      test_dict['descr'] = sheet.iloc[i]['descr'] 
      test_dict['adminSt'] = sheet.iloc[i]['adminSt'] 
      test_dict['authSt'] = sheet.iloc[i]['authSt'] 
      test_dict['servers'] = [] 
      server_info = {} 
      server_info['server_hostname'] = sheet.iloc[i]['server_hostname_ip'] 
      server_info['server_descr'] = sheet.iloc[i]['server_descr'] 
      server_info['server_preferred'] = sheet.iloc[i]['server_preferred'] 
      server_info['server_EPG'] = sheet.iloc[i]['server_EPG'] 
      server_info['minPoll'] = sheet.iloc[i]['server_minPoll'] 
      server_info['maxPoll'] = sheet.iloc[i]['server_maxPoll'] 
      test_dict['servers'].append(server_info) 
      last_test_dict = test_dict # keep a handle to our new dict 
     else: 
      # Use the handle to the last test dict created to add info 
      # about a new server without modifying the name of the test 
      server_info = {} 
      server_info['server_hostname'] = sheet.iloc[i]['server_hostname_ip'] 
      server_info['server_descr'] = sheet.iloc[i]['server_descr'] 
      server_info['server_preferred'] = sheet.iloc[i]['server_preferred'] 
      server_info['server_EPG'] = sheet.iloc[i]['server_EPG'] 
      server_info['minPoll'] = sheet.iloc[i]['server_minPoll'] 
      server_info['maxPoll'] = sheet.iloc[i]['server_maxPoll'] 
      last_test_dict['servers'].append(server_info) 

    # In case we didn't enter the last test dict into the list 
    dict_list.append(last_test_dict) 
    return dict_list 

sheet = pd.read_csv('sheet.csv', sep=',') 
pp(excel_to_dict(sheet)) 
+0

這看起來很完美。我唯一的問題是'如果我!= 0:然後dict_list.append(test_dict)'。這似乎意味着只要我們不在第一行,也不在具有空白名稱的行上,然後將test_dict附加到我們的主dict列表中。上次我們通過這部分代碼時,我們是不是追加了test_dict中的任何內容?在此之後,我們按預期從該行抓取數據。我只是不明白爲什麼你需要追加test_dict,如果我們不在索引0和新的名稱條目。 – mikey

+0

正確的做法是,在開始的時候,每當我們處理一個新行時,_does_包含一個新的測試名稱,我們會將之前的'test_dict'附加到'dict_list'中。但是,當我們到達第1行時,以前的'test_dict'將會是空的,我們會嘗試將一個空字典插入'dict_list'。所以我確保不要在i == 0時這樣做。這也意味着最終的'test_dict'不會進入for循環中的'dict_list',所以我在return語句之前添加了行。 – Chirag

+0

我現在明白了。有什麼辦法來推廣這個功能嗎?我有其他工作表,不同的列將受益於這個功能,除了它被硬編碼爲靜態佈局,我需要爲每個工作表編寫不同的代碼。將數據結構從最後一列構建到第一個是否是最簡單的方法?我問,因爲我有一個表與這些標題:網站,建築物,地板,房間,行,機架。一個網站有一個名稱可以有多個建築物,一個建築物有一個名稱和多個樓層,以此類推。 – mikey