2017-05-23 95 views
1

我正在從XML文件中讀取數據,我必須建立字典。 xml文件的數據是:如何將數據拆分爲字典?

<?xml version="1.0" standalone="yes"?> 
<University> 
    <Form> 
    <RNlength>10</RNlength> 
    <ROLLNUMBER> 
     0-0,A-1,A-2,0-3,0-4,A-5,A-6,A-7,0-8,0-9, 
     1-0,B-1,B-2,1-3,1-4,B-5,B-6,B-7,1-8,1-9, 
     2-0,C-1,C-2,2-3,2-4,C-5,C-6,C-7,2-8,2-9, 
     3-0,D-1,D-2,3-3,3-4,D-5,D-6,D-7,3-8,3-9, 
     4-0,E-1,E-2,4-3,4-4,E-5,E-6,E-7,4-8,4-9, 
     5-0,F-1,F-2,5-3,5-4,F-5,F-6,F-7,5-8,5-9, 
     6-0,G-1,G-2,6-3,6-4,G-5,G-6,G-7,6-8,6-9, 
     7-0,H-1,H-2,7-3,7-4,H-5,H-6,H-7,7-8,7-9, 
     8-0,I-1,I-2,8-3,8-4,I-5,I-6,I-7,8-8,8-9, 
     9-0,J-1,J-2,9-3,9-4,J-5,J-6,J-7,9-8,9-9, 
     K-1,K-2,K-5,K-6,K-7,L-1,L-2,L-5,L-6,L-7, 
     M-1,M-2,M-5,M-6,M-7,N-1,N-2,N-5,N-6,N-7, 
     O-1,O-2,O-5,O-6,O-7,P-1,P-2,P-5,P-6,P-7, 
     Q-1,Q-2,Q-5,Q-6,Q-7,R-1,R-2,R-5,R-6,R-7, 
     S-1,S-2,S-5,S-6,S-7,T-1,T-2,T-5,T-6,T-7, 
     U-1,U-2,U-5,U-6,U-7,V-1,V-2,V-5,V-6,V-7, 
     W-1,W-2,W-5,W-6,W-7,X-1,X-2,X-5,X-6,X-7, 
     Y-1,Y-2,Y-5,Y-6,Y-7,Z-1,Z-2,Z-5,Z-6,Z-7 
    </ROLLNUMBER> 
    </Form> 
</University> 

從上面的.xml文件,我必須根據字符串的長度提取數據。通過提取「RollNUMBER」元素信息,我必須用逗號(,)分割數據。之後提取信息'0-0'表示之前' - '表示標籤,' - '表示列號後。列的總數等於字符串的長度。對於給定的.xml數據,字典應該如下所示:

data = [ 
    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":5,"6":6,"7":7,"8":8,"9":9}, 
    {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G",.......,"24":"Y","25":"Z"}, 
    {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G",.......,"24":"Y","25":"Z"}, 
    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":5,"6":6,"7":7,"8":8,"9":9}, 
    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":5,"6":6,"7":7,"8":8,"9":9}, 
    {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G",.......,"24":"Y","25":"Z"}, 
    {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G",.......,"24":"Y","25":"Z"}, 
    {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G",.......,"24":"Y","25":"Z"}, 
    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":5,"6":6,"7":7,"8":8,"9":9}, 
    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":5,"6":6,"7":7,"8":8,"9":9} 
] 

其中,字典的關鍵字是值的索引。 我實現了它,但我不知道如何提取字典格式的數據。

from xml.dom import minidom 
xmldoc = minidom.parse('demo.xml') 
itemlist = xmldoc.getElementsByTagName('ROLLNUMBER') 
pattern = itemlist[0].firstChild.nodeValue 

l = [x.strip() for x in pattern.split(',')] 

回答

1

請注意,您的預期輸出中應該是"25": "Z"因爲A是從0

索引然後,您可以通過再次將所述令牌和第二部分轉換爲整數達致這。 這裏內循環確保列號在使用前包含一個詞典。

l = [] 
for x in pattern.split(','): 
    x2 = x.split('-') 
    label = x2[0] 
    col = int(x2[1]) 
    while len(l) < col + 1: 
    l.append({}) 
    l[col][str(len(l[col].keys()))] = label 

現場測試: https://repl.it/IMoU/1